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Abstract 

A  machine  which  can  read  unconstruned  words  remains  an  unsolved  problem.  For 
example,  automatic  entry  of  handwritten  documents  into  a  computer  is  yet  to  be  accom¬ 
plished.  Most  systems  attempt  to  segment  letters  of  a  word  and  read  words  one  character 
at  a  time.  Segmenting  a  handwritten  word  is  very  difficult  and  often,  the  confidence  of  the 
results  is  low.  Another  method  which  avoids  segmentation  sJtogether  is  to  treat  each  word 
as  a  whole.  This  research  investigates  the  use  of  Fom-ier  Transform  coefficients,  computed 
from  the  whole  word,  for  the  recognition  of  hjmdwritten  words.  To  test  this  concept,  the 
particular  pattern  recognition  problem  studied  consisted  of  classifying  four  handwritten 
words,  ’Buffalo’,  ’Vegas’,  ’Washington’,  ’City.’  Several  feature  subsets  of  the  Fourier  coef¬ 
ficients  are  ex2unined.  The 
Karhunen-Loeve  transform 


best  recognition  performance  of  76.2%  was  achieved  when  the 
was  computed  on  the  Fourier  coefficients. 
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Handwritten  Word  Recognition  Based  on  Fourier  Coefficients 


1.  Introduction 

1.1  Background 

Virtually  any  company  or  organization  spending  Iwge  sums  of  money  processing 
documents  is  interested  in  developing  a  system  to  recognize  unconstrained  words,  i.e.  a 
reading  machine.  For  example,  the  United  States  Postal  Service  funds  research  to  de¬ 
velop  an  automated  system  for  reading  handwritten  addresses  on  mail.  Similtirly,  bamking 
institutions  want  a  system  to  read  the  handwritten  amoxmt  on  a  check.  Further,  the  cen¬ 
sus  bureau,  which  processes  approximately  250  million  forms  every  ten  years,  is  searching 
for  an  automated  system  to  read  the  occupation  block  on  their  form.  A  system  capable 
of  reading  hamdwritten  words  has  tremendous  value  to  the  aforementioned  organizations. 
Such  a  system  would  also  fit  in  nicely  with  the  needs  of  the  Air  Force  because  thousands  of 
jobs  are  dedicated  to  document  processing  and  the  current  Department  of  Defense  policy 
is  down-sizing  the  workforce. 

Handwritten  character  and  word  recognition  h^ls  been  intensely  investigated  for  the 
past  50  years.  L.A.  Pintsov  states  that  edthough  many  aJgorithms  have  been  developed  to 
recognize  handwritten  characters,  few  new  ideas  on  how  to  solve  the  problem  have  come 
about  in  the  past  thirty  yeeirs  (20).  This  comment  indicates  the  difl!iculty  in  machine 
recognition  of  imconstrmned  text. 

What  is  the  best  recognition  that  we  can  expect  to  achieve  by  a  meichine?  The  human 
recognition  rate  error  rate  for  imconstrained  isolated  ch2u:acters  is  about  4  percent,  bzised 
on  widely  accepted  experimental  data  (20).  For  unconstreuned  isolated  words  it  may  even 
be  greater. 

1.2  Problem  Statement 

In  broad  terms,  a  machine  that  cein  read  unconstrained  ’words’  is  undeveloped. 
Specifically,  a  machine’s  ability  to  read  text  in  handwritten  form  by  recognizing  the  entire 
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word  is  not  yet  possible.  This  research  effort  investigates  the  use  of  Fourier  transform 
coefficients  as  unique  features  which  can  be  used  to  classify  a  group  of  handwritten  words. 


1.3  Summary  of  Current  Knowledge 

Recognition  of  5000  machine-typed  words  is  possible,  with  99%  2M;curacy  using  Fomrier 
coefficients,  as  demonstrated  by  O’Hair  (16).  The  Service  de  Recherche  Technique  de  la 
Poste  has  developed  a  device  to  recognize  the  handwritten  sunount  on  postal  checks  (9). 
Recognition  rates  reported  79%  on  a  test  set  of  2492  words  from  27  classes  using  Hidden 
Markov  Models  (9).  No  rejection  rates  were  reported. 

1.4  Research  Objectives 

This  thesis  will  investigate  a  means  to  classify  a  group  of  handwritten  words.  The 
specific  objectives  of  this  research  effort  are  as  follows: 

1.  Develop  a  method  of  calculating  features  bzised  on  the  2-dimensional  discrete 
Fourier  Transform. 

2.  Investigate  the  use  of  the  F-ratio  for  feature  subset  selection. 

3.  Investigate  using  Karhimen-Loeve  transform  as  a  means  of  feature  set  reduction. 

4.  Test  several  feature  subsets  of  Fourier  coefficients  using  the  multi-layer  perceptron 
and  k-nearest  neighbor. 

5.  Develop  a  method  to  incorporate  an  “add-on”  feature  selection  procedure. 

1.5  Approach  and  Methodology 

This  thesis  will  use  data  of  h2mdwritten  words  provided  by  the  Center  of  Excellence 
for  Document  Anedysis  smd  Recognition,  State  University  of  New  York  at  Buffalo  (15). 
The  database  contains  hcindwritten  cities,  states,  ZIP  codes,  and  others;  however,  only  the 
hzmdwritten  cities  will  be  used.  The  data  is  limited  so  only  a  four-class  problem  will  be 
attempted.  Additional  data  will  be  collected  from  people  working  in  the  laboratory.  The 
training  set  and  test  set  will  include  200  patterns  each. 
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A  variety  of  feature  extraction  algorithms  are  examined  to  produce  a  suitable  feature 
set  for  classification. 

1.5.1  Fourier  Analysis.  The  use  of  Fourier  coefficients  to  classify  whole,  machine- 
typed,  words  has  proven  very  successful  in  past  research  (17).  This  study  will  pursue  the 
use  of  Foiurier  coefficients  in  the  recognition  of  handwritten  words.  The  thrust  of  the 
research  will  be  searching  for  the  best  subset  of  features  out  of  the  Fourier  coefficients 
calculated  from  the  2-dimensional  discrete  Fourier  TVansform. 

1.5.2  Handuiritten  Word  Classification.  The  primary  classification  techniques 
that  will  be  used  are  the  k-nearest  neighbor  and  the  multi-layer  perceptron.  The  k-nearest 
neighbor  classifier  is  a  quick  and  easy  way  to  classify  a  pattern.  A  sample  is  assigned  to  a 
class  based  on  the  class  given  by  the  k  nearest  neighbors  (5).  The  multi-layer  perceptron 
is  a  means  to  separate  non-linearly  separable  data  (23). 

1.6  Thesis  Organization 

This  thesis  document  is  divided  in  the  following  way:  Chapter  II  discusses  past 
research  in  the  jirea  of  whole  word  recognition.  Chapter  III  describes  the  method  of  this 
study.  Chapter  IV  reports  the  results  of  the  study.  Chapter  V  discusses  the  results  and 
conclusions. 

1.7  Summary 

The  recognition  of  mewihine-typed  words  has  proven  very  successful  (16);  however, 
the  recognition  of  handwritten  words  has  had  little  success.  This  thesis  wiU  attack  the 
difficult  problem  of  machine  recognition  of  unconstrmned  handwritten  words  based  on 
featmres  calculated  from  the  Fourier  Tr^lnsfo^m  of  the  image  of  the  word.  Several  methods 
of  feature  subset  selection  will  be  employed. 
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II.  Literature  Review 


2. 1  Introduction 

This  literature  review  examines  the  background  information  relevimt  to  this  study  of 
handwritten  word  recognition.  The  current  techniques  used  in  handwritten  word  recogni¬ 
tion  are  discussed  as  well  as  the  justification  for  using  the  whole  word  to  avoid  ch2U‘acter 
segmentation.  Finally,  a  brief  review  of  the  classification  techniques  used  in  this  study  is 
presented. 

2.2  Background 

Much  is  known  on  where  information  is  processed  in  the  brain.  The  human  visual 
system  has  been  extensively  mapped  out  (25).  The  visual  system  compresses  visual  images 
at  a  ratio  of  100  to  1,  which  leads  researchers  to  believe  that  only  essential  information  is 
needed  to  construct  a  visucil  model  of  the  world  (23).  The  big  problem  is  that  it  is  not 
known  exactly  how  the  information  is  compressed.  It  has  been  theorized  by  Kabrisky  that 
this  compression  can  be  modeled  in  part  by  computing  the  Fotirier  IVansform  of  visual 
images  (11).  There  is  evidence  that  this  may  be  exactly  what  is  going  on.  (23,  2).  A  few 
ex2imples  show  the  correlation  between  Fourier  distances  and  human  psychological  tests. 
The  famous  Jinimal  cracker  test  discussed  in  Dr.  Rogers’  book  showed  that  the  relative 
closeness  of  pairs  of  animeds  rated  by  hiunans  corresponded  to  their  respective  distemce  in 
Fourier  space  (23).  Simileurly,  Bush  generated  a  better  set  of  char2tcters  for  pilots  to  see, 
especially  under  stressful  conditions  (2).  By  using  coefficients  computed  from  the  Fourier 
IVansform,  he  compared  the  nearest  neighbor  distance  in  Fourier  spcu;e  to  the  letters  that 
were  misclassified  by  humws  smd  discovered  the  mistakes  related  to  the  distance.  When 
changing  the  font  style  to  spread  out  the  distance  in  Fourier  spiu:e,  less  mistakes  were  made 
by  humans  (2).  Further,  Ginsburg  showed  that  msuiy  visual  illusions  work  not  only  for 
humans,  but  also  for  the  Fourier  Tramsform  (23). 

Although  every  attempt  is  made  to  simulate  what  is  happening  in  the  human  bradn, 
one  must  faice  the  fact  that  at  this  point,  all  one  can  do  is  attempt  to  classify  patterns 
based  on  numbers.  As  Rogers  says  in  his  book,  “Since  nobody  understands  how  brains 


do  any  processing  of  significance,  statements  that  these  artificial  neural  networks  work  as 
brains  do  2u:e  Lies,  Lies,  all  Lies!!”  (23). 

2.3  Difficulty  in  Segmenting  Characters  of  Words 

Trying  to  separate  the  individual  characters  of  a  word  is  extremely  difficult  when  the 
word  is  handwritten,  especially  handwritten  script.  Any  person  can  easily  read  this  text 
and  pick  out  the  characters  that  make  up  each  word.  Even  neatly  written  cursive  script 
can  easily  be  read.  But  those  tasks  are  very  difficult  for  a  computer  or  m3Mliine  to  do.  For 
example,  in  typewritten  text  of  the  type  found  in  magazines,  when  an  ‘r’  and  ‘n’  are  side 
by  side,  the  machine  may  confuse  the  word  ‘modem’  for  ‘modem’  (24).  Another  difficulty 
could  occur  when  non-text  information  appears  in  the  image  (3).  For  instance,  if  a  word 
or  group  of  words  is  underlined,  such  as  the  title  of  a  book,  a  machine  could  mistake  the 
underlining  as  part  of  each  character.  K  the  word  ‘The’  is  in  the  title  and  underlined,  the 
machine  could  mistake  the  ‘T’  for  and  ‘I’.  Finally,  a  machine  could  encounter  difficulties  in 
reading  text  when  letters  overlap  in  terms  of  their  defined  space,  an  occurrence  known  as 
kerning.  Letter  overlap  commonly  occxirs  in  cursive  or  italic  writing  where  a  capital  letter 
is  followed  by  a  lower  case  letter  that  sits  directly  beneath  a  portion  of  the  capitalized 
letter.  Therefore,  the  question  was  asked,  is  there  any  way  to  avoid  these  difficulties? 

2.4  Current  Methods  of  Handwritten  Word  Recognition 

In  his  1985  thesis,  O’Hair  proposed  treating  words  as  single  symbols  to  avoid  the 
difficult  segmentation  problem  of  separating  each  word  into  its  individual  char2ud;ers  (16). 
His  experiments  were  tested  on  typewritten  text.  By  collecting  features  of  the  words  from 
the  lower  harmonics  of  the  2-dimensional  discrete  Fourier  transform  and  classifying  them 
with  the  k-neeurest  neighbor  algorithm,  he  achieved  a  recognition  rate  of  94%.  He  continued 
this  work,  expanding  his  database  of  typewritten  words  to  include  words  using  all  fonts  and 
various  character  separations.  Several  distance  metrics  were  used  to  measure  the  features 
against  a  template  of  the  word.  He  achieved  a  99%  recognition  rate  on  5,000  words  (17). 

Srihari  uses  two  methods  for  word  recognition  (27).  One  of  the  methods  is  based 
on  segmenting  the  word  into  characters  and  identifying  the  characters  (27).  Using  this 
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method,  they  achieved  a  92.0%  recognition  rate  on  a  ten  class  problem  with  3,000  words 
(27).  The  second  method  they  used  is  to  break  the  word  up  into  segments  and  recognize 
each  segment  using  Hidden  Markov  Models  (27).  The  recognition  rate  on  the  same  set  of 
data  in  93%  (27).  They  also  performed  tests  interpreting  handwritten  address  on  mail  at 
a  current  performance  of  44%  with  6%  error  (27). 

Tin  Kam  Ho,  et  al.  experimented  with  machine-typed  whole  word  recognition  (10). 
They  used  a  combination  of  features  based  on  character  segmentation  and  the  use  of  the 
whole  word  (10).  This  was  one  of  the  first  attempts  at  using  a  combination  of  techniques  to 
recognize  words  (10).  The  features  they  used  in  the  the  whole  word  analysis  were  based  on 
using  a  7x7  template  and  convolving  with  the  image.  This  resulted  in  a  1280-dimensional 
feature  vector.  Secondly,  stroke  direction  was  used,  resulting  in  a  160-dimensional  feature 
vector.  Four  types  of  strokes  directions  were  used;  east-west,  northeast-southwest,  north- 
south,  and  northwest-southeast.  Then  a  nearest  neighbor  classifier  was  used  to  classify  the 
word.  This  decision  was  then  combined  with  results  from  five  other  independent  classifiers 
to  get  a  consensus  ranking.  The  experimental  results  showed  a  88.9%  recognition  rate 
with  1671  words  in  a  33850  word  lexicon.  This  is  a  good  example  of  the  use  of  combining 
independent  clz^sifiers. 

Reseaurchers  in  France  are  ciurently  developing  a  real  world  system  to  read  the  amount 
on  a  postal  check  (13).  They  compare  the  handwritten  word  amount  to  the  handwritten 
digit  amoimt  to  verify  the  amoimt  on  the  check.  In  the  recognition  of  the  hemdwritten  word 
amoimt,  they  fuse  two  methods  for  recognition.  The  two  methods  are  based  on  whole  word 
recognition  and  charar  ter  segmentation  of  the  whole  word  into  characters.  The  whole  word 
method  uses  dynamic  time-warping.  By  looking  for  vertical  lines,  loops,  horizontal  lines, 
emd  dots,  they  built  a  representation  of  the  word  by  which  to  compare  to  the  reference 
codebook  words.  The  fusion  process  compares  the  output  classification  of  the  whole  word 
method  to  the  character  segmentation  method.  In  the  final  process,  the  digit  amoimt  is 
compared  to  the  hemdwritten  sunoimt  and  the  fin^J  amount  is  determined.  As  of  1991,  the 
recognition  rate  was  40%  on  a  set  of  6,400  samples.  This  example  illustrates  the  difficulty 
of  handwritten  word  recognition  on  real  world  data. 


Some  research  has  been  done  in  trying  to  characterize  the  variability  in  cursive  hand¬ 
writing  (21).  Some  of  the  characteristics  examined  were  dots,  dashes,  writing  slant,  zones 
and  baselines,  zone  heights,  local  min  and  max,  concavity  and  curvature,  loops  or  spikes, 
cusps,  and  the  way  letters  are  connected.  The  conclusions  were  that  it  is  very  difficult  to 
characterize  styles  of  handwriting.  Recognition  rates  for  four  different  styles  of  handwrit¬ 
ing  restilted  in  89.5%,  82.0%,  49.0%,  and  21.0%.  Again,  trying  to  find  the  appropriate 
fei  ^ures  continues  to  elude  researchers  today. 

Hidden  Markov  Models,  which  have  been  successful  in  speech  recognition  are  now 
being  applied  to  handwritten  word  recognition.  Hidden  Markov  Models  are  effective  in 
lowering  the  sensitivity  to  many  variations  in  styles  of  handwriting  (7).  This  is  another 
technique  that  can  avoid  the  difficult  segmentation  stage  (7).  Bertille  and  Yacoubi  used 
Hidden  Markov  Models  to  recognize  postal  codes  without  segmentation  (1).  They  achieved 
a  range  of  recognition  rates  firom  28%  to  59%  for  14  and  16  states  respectively  (1).  Other 
research  which  uses  Hidden  Markov  Models  is  reported.  Gilloux,  who  heads  the  Service 
de  Recherche  Technique  de  la  Poste,  developed  a  prototype  system  to  recognize  the  hand¬ 
written  amoimt  on  a  check  (9).  The  words  are  segmented  into  segments  which  are  used 
for  recognition  using  Hidden  Mcirkov  Models.  The  results  reported  a  79%  recognition  rate 
on  a  test  set  of  2492  words  of  27  classes  (9).  They  also  reported  that  the  system  had  poor 
generalization  capabilities.  (9) 

All  the  methods  described  so  far  involve  using  some  type  of  edgorithm  or  combinei- 
tion  of  algorithms.  J.C.  Simon  describes  several  principles  which  allow  for  more  robust 
algorithms  in  recognizing  hsmdwritten  words  (26).  The  principles  he  applies  are  as  follows: 
1)  several  levels  of  recognition  where  each  level  is  assigned  a  probability  of  occurrence,  2) 
use  independent  sources  of  information  and  estimate  joint  performance,  and  3)  feedback 
is  utilized  between  each  level  of  recognition  (26).  These  principles  are  based  on  neuro¬ 
physiological  evidence  (26).  The  results  reported  applying  these  principles  showed  a  79.5% 
recognition  rate  on  a  lexicon  of  25  words  (26). 
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2.5  Feature  Extraction 


All  pattern  recognition  people  will  say  that  good  featiu-es  mahes  for  good  recognition. 
Finding  features  that  distinguish  one  pattern  from  another  is  the  key  to  a  viable  pattern 
recognition  system.  "Most  papers  give  no  reasons  for  the  choice  of  features.  In  fact, 
most  features  in  pattern  recognition  work  are  chosen  on  the  grounds  that  the  choice  is 
intuitively  reasonable.  Once  the  features  have  been  chosen  authors  apply  sophisticated 
statistical  methods  in  order  to  minimize  errors.  In  most  cases,  however,  the  game  has 
been  lost  with  the  choice  of  features.”  (20).  “There  is  an  unfortunate  tendency  in  many 
articles  describing  laboratory  systems  to  describe  the  classification  method  in  detail,  and 
to  give  inadequate  or  no  information  about  the  features  tised  as  the  basis  for  classification, 
how  they  are  measured,  and  most  importantly,  how  they  were  derived  and  why  they  were 
chosen  over  other  feature  metrics  (20).”  Feattires  are  not  usually  defined  in  terms  of  style 
Aviations  or  distortion  (20).  It  seems  reasonable  that  much  effort  should  be  placed  in 
developing  good  features.  Of  course,  this  is  the  hard  peurt  and  why  this  problem  has 
been  around  for  so  long.  Especially  when  you  have  people  believing  that,  “There  is  no 
xmique  computational  procedure  which  can  "extract”  the  identity  of  a  character  from  its 
image.”  (20)  If  this  is  true,  then  computing  the  Fourier  transform  will  be  of  no  value.  That 
is  what  this  study  intends  to  find  out. 

If  it  is  true  that  no  computationed  procedme  can  extract  the  identity  of  a  character  or 
a  word,  then  the  other  option  is  to  use  the  raw  data.  According  to  Fukunaga,  ”...  as  long 
eis  features  Jire  computed  from  the  measurements,  the  set  of  features  c£mnot  csury  more 
classification  information  than  the  measurements.”  (6)  Raw  measiurements  in  the  case  of 
the  image  of  a  word  are  raw  pixel  values.  These  pixel  v^Jues  become  the  features. 

Pintsov  discusses  two  types  of  models  which  can  describe  a  character.  The  first,  called 
generative,  is  thinking  of  the  path  that  a  writing  instrument  follows  when  generating  a 
character.  How  the  character  is  formed  is  what  can  be  analyzed.  The  second,  called 
transformative,  believes  in  an  ideal  form  of  the  character,  so  the  shape  itself  is  analyzed 
in  this  case.  It  is  not  known  exactly  what  the  ideal  character  form  is.  Because  there 
is  no  formal  definition  of  a  character,  one  must  rely  on  data  of  hiunan  recognition  and 
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perception.  Points  where  the  contour  changes  direction  is  much  more  informative  than 
points  on  flat  portions. 

What  happens  in  our  visual  system  when  we  see  an  image?  When  looking  at  words, 
or  anything  for  that  matter,  the  brain’s  neurons  are  firing  to  produce  the  illusion  in  the 
brain  of  the  scene  being  viewed.  The  image  failing  on  the  retina  is  compressed  by  a  factor 
of  100:1  as  it  leaves  the  retina  and  this  data  is  enough  information  to  construct  the  word 
in  the  brain  (23).  At  this  point,  the  compressed  data  is  the  feature  set  used  by  the  brain. 
Now  then,  the  image  that  falls  on  the  retina  may  contain  other  information  that  is  not 
part  of  the  word  being  looked  at.  So  the  actual  compression  of  the  word  may  not  be  100:1, 
but  it  certainly  is  reduced  to  some  amount.  For  a  computer,  at  this  point,  the  image  of 
the  word  is  represented  by  an  array  of  pixel  values  and  no  other  information.  If  we  sure 
to  compress  this  information,  how  do  we  do  it?  A  starting  place  may  be  to  binarize  the 
image. 

Recognizing  a  handwritten  word  is  similar  to  recognizing  say  a  hand  or  foot  or  nose, 
even  though  there  are  all  different  types  of  variations,  the  object  or  word  is  the  same.  The 
brain  somehow  extracts  constant  features.  The  primary  visual  cortex  processes  color,  form, 
and  motion  in  separate  areas  (25)  So  there  is  more  information  contained  in  a  handwritten 
word,  such  as  color  and  whether  it  is  moving  or  not. 

If  we  are  searching  for  features  based  on  evidence  of  features  used  by  humans  amd 
animals,  then,  “The  obvious  plz^re  to  start  is  to  emulate  the  ruthless  preprocessing  that  is 
accomplished  by  our  biologicd  sensors.”  (23) 

2.6  Previous  Success  using  Fourier  Features 

A  lot  of  work  heis  been  done  in  the  field  of  pattern  recognition  using  Fourier  coefli- 
cients  as  features,  especially  trying  to  classify  machine  printed  numbers,  letters,  and  words. 
Radoy  achieved  success  trying  to  recognize  machine  generated  letters  of  the  alphabet  (22). 
The  features  he  used  to  classify  the  letters  came  from  computing  the  2  dimensional  dis¬ 
crete  Fourier  Transform.  Tallmem  tried  to  relate  the  classification  of  visual  images  by  the 
human  visual  system  to  a  digital  simulation  based  on  spatial  filtering  (29).  He  filtered 
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out  the  low  order  Fourier  frequency  components  and  was  able  to  recognize  handwritten 
letters  of  the  alphabet  with  a  95.8  percent  accuracy  using  the  7x7  low  pass  frequency 
filter.  And  more  recently,  O’Hair  achieved  a  successful  recognition  rate  by  classifying  5000 
machine-printed  words  using  low  order  Fourier  coefficients  (17).  The  continued  success 
of  the  Fourier  Transform  in  classifying  letters  of  the  alphabet  and  machine-typed  words 
makes  the  Fourier  IVansform  an  obvious  choice  to  use  on  handwritten  words. 

2. 7  Classification  Techniques 

2.7.1  k-nearest  neighbor.  The  k-nearest  neighbor  method  for  classification  uses 
the  rule  that  a  sample  is  zissigned  a  class  based  on  the  class  given  by  the  k  nearest  neighbors 
(5)  In  this  study,  each  patteni  will  have  a  spot  in  Fourier  space.  A  Euclidean  distwce  is 
calculated  to  find  the  closest  neighbor. 

The  program  used  to  run  a  k-nn  neighbor  algorithm  is  called  LNKnet  (12).  The 
training  portion  of  the  program  stores  all  the  input  patterns.  The  testing  portion  of  the 
program  computes  the  Euclidean  distance  from  the  input  pattern  to  all  the  stored  patterns. 
”The  class  selected  for  the  test  pattern  is  the  one  with  a  plurality  of  the  classes  of  the  k 
nearest  neighbors.  (12)”  Appendix  A  contains  a  tutorial  for  using  LNKnet. 

2. 7.2  multi-layer  perceptron.  The  multi-layer  perceptron  is  a  method  that  devel¬ 
ops  a  non-linear  discriminant  function  during  training  that  separates  training  data.  The 
architecture  of  the  network  consists  of  three  layers;  em  input  layer,  hidden  layer,  and  out¬ 
put  layer.  The  input  layer  has  as  many  inputs  as  the  number  of  features  for  each  pattern, 
while  the  number  of  output  nodes  is  determined  by  the  number  of  classes  to  be  recognized. 
The  weighted  sums  of  the  input  layer  is  passed  through  a  sigmoid  function  at  each  hidden 
node,  likewise  the  weighted  siim  of  the  hidden  layer  is  passed  through  a  sigmoid  function 
at  each  output  node.  During  training,  the  weights  are  updated  to  provide  the  minimum 
error  at  the  output  layer.  A  gradient  descent  method  is  used  to  update  the  weights  through 
back  propagation.  For  testing,  the  class  corresponding  to  the  highest  output  node  value  is 
determined  the  class  of  the  pattern  (12). 
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2.1.S  Fusion  of  Classifiers.  The  state  of  the  art  pattern  recognition  techniques 
used  in  Japan  today  zure  based  on  what  they  call  ” multi-expert”  recognition  (pg  8).  Multi¬ 
expert  recognition  enhemces  the  overall  recognition  by  combining  several  independent 
methods  of  recognition.  A  character  recognition  competition  was  held  in  1992  and  the 
highest  three  scoring  algorithms  were  combined  and  used  to  classify  the  test  set  used  dur¬ 
ing  the  competition.  The  test  set  included  10,000  samples  of  the  digits  0-9.  The  results 
showed  an  improvement  from  96.2%  to  99%. 

Guerts  reported  a  4.3%  increase  in  recognition  of  targets  by  fusing  independent 
cleissifiers  acting  on  the  same  data. 

No  matter  how  fancy  the  classification  technique  though,  the  features  we  the  most 
important  part.  Without  the  right  features,  the  best  classification  routine  won’t  work. 

2.8  Conclxision 

This  chapter  of  the  thesis  reviewed  current  techniques  in  handwritten  word  recogni¬ 
tion  and  the  past  success  of  Fourier  coefficients  in  recognizing  machine- typed  text.  It  is 
the  goal  of  this  study  to  determine  whether  or  not  it  is  feasible  to  use  Fourier  coefficients 
computed  from  handwritten  words  for  classification. 
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III.  Approach  and  Methodology 


3. 1  Introduction 

This  part  of  the  thesis  describes  the  method  used  in  this  study  to  determine  whether 
or  not  the  Fourier  coefficients  computed  from  the  word  images  are  of  value  in  classifying 
hiind-printed  words.  A  description  of  the  data  used,  how  it  was  preprocessed,  the  method 
of  featmre  extraction,  amd  the  type  of  classification  used  encompasses  this  chapter. 

3.2  Data  Set  Description 

3.2.1  Handwritten  Words.  In  general,  it  is  very  difficult  to  obtain  real  world  data 
for  pattern  recognition.  It  is  especially  difficult  trying  to  find  a  good  set  of  hamdwritten 
words  for  an  experiment.  The  State  University  of  New  York  at  Buffado  produced  a  database 
of  handwritten  cities,  states,  and  Zip  codes.  It  is  from  this  database  that  some  of  the 
handwritten  words  used  in  this  study  were  obtained. 

To  further  describe  the  difficulties  in  obtaining  data,  the  database  contains  over 
3000  handwritten  words,  yet,  few  examples  of  any  pauticular  word  existed  in  the  database; 
therefore,  the  words  with  the  most  occtirrences  were  chosen.  The  words  ’buffalo’,  ’city’, 
’Washington’,  and  ’vegas’  topped  the  list  of  the  most  examples.  These  examples  consisted 
of  printed  as  well  as  script  styles  of  writing.  Only  the  printed  samples  were  selected  for  the 
data  set.  This  resulted  in  a  data  set  of  4  classes  with  18  patterns  in  each  class.  This  is  not 
enough  data  to  have  meaningful  results,  so  hand-printed  samples  of  the  words  ’buffalo’, 
’city’,  ’Washington’,  and  ’vegas’  were  collected  from  various  people  in  our  lab.  The  final 
data  set  for  the  4-class  problem  contsdned  400  seunples,  100  from  each  class. 

3.2.2  Examples  of  the  Words.  Figure  3.1  displays  some  examples  of  the  data. 
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3.2.3  Searching  the  Database  for  Words.  After  determining  which  words  will  be 
used  for  an  experiment,  a  script  file  to  search  the  database  for  the  particular  words  desired 
can  be  used.  The  script  file  searches  the  image  truth  files  and  when  it  finds  a  word  it  is 
searching  for,  it  converts  the  file  to  a  sunraster  image  file.  See  Appendix  B. 

3.3  Preprocessing  the  Images 

Since  the  words  were  alrejidy  well  segmented,  it  is  only  necessary  to  do  some  minor 
preprocessing.  The  two  preprocessing  techniques  used  are  described  below. 

3.3.1  Binarization.  A  binarization  of  the  image  is  accomplished  mmnly  to  stan¬ 
dardize  the  images.  Some  of  the  images  have  darker  backgroimds  th£m  others.  After 
binarization,  a  pixel  value  that  represents  part  of  the  word  has  a  pixel  vedue  of  zero  and 
any  part  of  the  image  that  is  backgroimd  has  a  pixel  value  of  255.  Figure  3.2  illustrates 
the  original  image  before  binarization.  3.3  illustrates  the  image  after  binarization. 

3.3.2  Window  the  image.  In  this  step,  the  image  is  cropped  so  that  the  word 
completely  fills  the  window  of  the  image.  This  technique  is  done  based  on  the  advice  of  Dr. 
Kabrisky.  Capt  O’Hair  did  all  his  work  using  words  that  completely  filled  the  window  (17). 
Performing  this  step  provides  scale  invariance.  Figme  3.4  illustrates  the  cropped  image. 
Agmn,  this  preprocessing  method  assumes  that  word  segmentation  cem  be  accomplished. 


Figure  3.2  Original  Image 


3-3 


Figure  3.3  Binarized  Image 
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S.4  Feature  Extraction 

3.4-1  Fourier  Feature  Extraction.  The  two-dimensional  discrete  Fourier  Trans¬ 
form  is  described  by  the  following  equation: 

SifxJy)  =  Ex  s{x,y)[cos2ir{f^x  +  f^y)  -  tstn27r(/,x  +  fyy)] 

where, 

•  fx  —  spatial  frequency  in  x 

•  fy  =  spatial  frequency  in  y 

•  M  =  height  of  image  in  pixels 

•  N  =  length  of  image  in  pixels 

•  R2uige  of  spatied  frequencies  in  x  to  czdculate  up  to  10  heirmonics:  —10/N,  —9/N, ...,  9/N,  IQ/N 

•  Range  of  spatial  frequencies  in  y  to  calciilate  up  to  10  harmonics:  —10/M,  —9/M, ...,  9/M,  10/M 

•  x,y  =  location  of  real  valued  input 

•  s(x,y)  =  intensity  of  image  at  location  x,y 

Each  image  of  a  handwritten  word  is  in  the  form  of  a  two-dimensional  euray  of  gray 
level  pixel  values  ranging  from  0  to  255.  The  intensity,  s(x,y),  of  the  image  at  pixel  location 
x,y  is  a  vzdue  0  or  255.  The  euray  has  a  height  M  and  a  length  N. 

By  taking  advantage  of  symmetry  of  the  Fourier  Transform,  only  half  of  the  coeflB- 
cients  need  to  be  calculated.  For  example,  the  cosine  is  em  even  function  and  the  sine  is 
an  odd  function  so  the  the  following  properties  apply:  (8) 

•  Re[F(A,B)]  =  Re[F(-A,  -B)] 

•  Re[F(-A,B)]  =  Re[F(A,  -B)] 

•  Im[F(A,B)]  =  -Im[F(-A,  -B)] 

•  Im[F(-A,B)]  =  -Im[F(A,  -B)] 

When  calculating  discrete  values  of  the  Fourier  Transform  up  to  the  third  heumonic, 
a  total  of  49  cosine  (48  plus  dc  term)  and  49  sine  terms  (dc  term  is  zero)  are  produced. 
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Using  the  properties  listed  above,  only  half  the  cosine  and  sine  terms  need  to  be  calculated. 
This  results  in  25  cosine  terms  and  24  sine  terms.  Now  49  unique  coefficients  are  calculated 
and  no  duplication  exists.  This  is  not  only  important  to  reduce  calculations  but  it  also 
eliminates  redundancy  in  the  feature  set. 


S.5  Energy  Normalization 


Once  the  Fourier  coefficients  are  calculated,  they  au:e  energy  normalized.  Since  each 
image  of  a  word  has  a  different  size  pixel  array,  a  means  to  normalize  the  values  to  an 
even  playing  ground  is  necessary.  The  energy  normalization  used  in  this  case  is  to  divide 
each  term  by  the  square-root  of  the  sum  of  the  squares  of  each  coefficient.  The  following 
formula  describes  this. 


<  Sr,c  >= 


_ Sr,c 

rssl 


(3.1) 


where. 


•  <  Sr,c  >  =  the  normalized  (r,c)’th  element 

•  r  =  rows 

•  c  =  columns 

•  n  =  number  of  hzuinonics 


3.6  Calculating  2D  Discrete  Fourier  Transform 

For  each  image,  the  Fourier  TVansform  was  computed.  A  total  of  10  harmonics  were 
computed.  Effectively  this  implements  a  21x21  spatial  filter  in  the  firequency  domain.  A 
totzd  of  441  numbers  result  for  each  word  image.  From  this  set  of  numbers,  a  more  specific 
spatial  frequency  filter  can  be  extracted.  For  ex^unple,  a  7  x  7  spatial  frequency  filter  which 
captures  the  lower  3  haurmonics,  or  a  3x5  spatial  filter,  etc. 

Figure  3.5  illustrates  the  numbers  used  to  specify  a  particulaur  Fourier  coefficient 
when  computing  the  Fourier  TVamsform.  The  features  above  the  horizontal  line  are  the 
cosine  terms  and  the  features  below  the  line  are  the  sine  terms.  The  dc  term  is  in  the 
middle.  Each  squaure  outline  represents  a  pairticulax  hairmonic.  The  results  chapter  refers 
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to  these  numbers.  When  a  7x7  spatial  frequency  filter  is  mentioned,  it  is  referring  to  the 
7x7  square  centered  on  the  dc  term. 

Figure  3.6  and  3.7  illustrate  the  reconstruction  of  a  word  image  from  the  lower  three 
harmonics  and  lower  ten  harmonics  respectively. 
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Figure  3.5  Feature  Numbers  for  10  Heirmonics  of  the  2-dimensional  discrete  Fourier 
'transform 
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Figure  3.7  Image  of  ’Buffalo’  Reconstructed  From  Ten  Harmonics 

3.6.1  Figurt  of  Merit  features.  "The  ability  of  a  feature  to  separate  two  classes 
depends  on  the  distance  between  classes  and  the  scatter  within  classes  (18).”  Fisher’s 
discriminant  is  a  method  to  characterize  the  separability  of  features  from  two  classes  based 
on  distance  between  classes  and  scatter  within  classes.  This  method  is  based  on  the 
following  equation  from  Parsons  (18). 


Figure  3.6  Image  of  ’Buffalo’  Reconstructed  From  Three  P  monies 
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.  (/*!  - 

^  al  +  al 


(3.2) 


An  extension  to  multiple  classes  is  defined  by  the  F  ratio  equation  or  generalized 
Fisher’s  discriminant  defined  in  Ekjuation  3.3  (18). 


p  variance  of  the  means(over  all  classes  in  one  dim.) 
mean  of  the  variance(within  classes  in  one  dim.) 

For  n  features  and  m  classes  this  is  defined  mathematically, 


(3.3) 


p  -  [l/(m-l)£r.,(>i)-sfl  ,  , 

'  '  ll/(m(n  -  1)  £r.i  -  /*>)=)] 

where, 

•  Xij  =  ith  feature  for  class  j 

•  ftj  =  mean  of  all  features  for  class  j 

•  ft  —  mean  of  all  measurements  over  all  classes 

This  can  be  used  to  evaluate  each  feature’s  ability  to  separate  the  classes.  The  higher 
the  F  ratio,  the  more  separable  it  is  and  therefore  more  easily  classifiable.  For  the  441 
features  that  were  calculated  by  the  Fourier  tramsform,  an  F  ratio  was  calculated  for  each 
feature  over  100  samples  from  each  class.  A  ramk  ordering  of  the  F  ratio  was  done  to  see 
which  features  had  the  most  sepaurability.  According  to  Parson’s,  selecting  the  best  F  ratio 
features  for  classification  is  not  saie  (18).  Although,  selecting  the  best  group  of  features 
for  classification  can  be  heuristic  in  natiure.  This  assumption  will  be  tested. 


S.  7  Feature  Subset  Evaluation 

Once  adl  441  featmes  au-e  cailculated  from  the  Fourier  IVansform,  it  is  desirable  to 
find  a  subset  of  the  441  featmes  with  which  to  use  for  classification.  Not  only  will  a  subset 
reduce  the  amount  of  computations,  but  it  adso  may  revead  a  much  better  feature  set  with 
which  to  claissify.  Severad  methods  aire  used  to  try  to  select  the  right  group  of  features. 
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The  methods  discussed  are  spatially  filtering  the  Fourier  IVansform,  cadculating  a  figure 
of  merit  on  each  dimension  of  the  data,  caloilating  the  Karhunen-Loeve  transform,  and 
using  magnitude  and  phase  information. 

5.7.1  A  7x7  Spatial  Filter.  As  outlined  in  Chapter  2  of  this  document,  the  lower 
3  harmonics  have  proven  very  successful  in  terms  of  classification  of  digits  and  machine 
printed  words.  So  naturally,  this  is  one  feature  set  that  is  used.  The  feature  subset  is  then 
developed  by  extracting  the  7x7  array  of  features  firom  the  21x21  array  calculated  by  the 
Fourier  Transform.  Once  the  subset  is  established,  the  classification  is  performed. 

3.7.2  Figure-of-Merit.  The  figure  of  merit  or  F-ratio  is  calculated  on  all  441 
dimensions  of  the  feature  set.  The  result  is  a  rank  ordering  of  F-ratios  and  their  cor¬ 
responding  feature.  Again,  the  higher  the  F-ratio,  the  more  separable  the  data  in  that 
particular  dimension.  From  this  ranking,  the  top  ten  features  are  selected  to  be  mem¬ 
bers  of  the  feature  subset.  The  classification  technique  performed  is  the  addon  technique 
described  later. 

In  addition,  the  top  49  figure  of  merit  features  are  selected  for  einother  feature  subset. 
This  will  allow  a  comparison  between  using  features  from  the  lower  3  harmonics  versus 
using  the  highest  ranking  figure-of-merit  features. 

3.7.3  Karhunen-Loeve  Transformation.  The  goal  in  doing  a  Karhimen-Loeve 
transformation  on  the  feature  set  is  to  obtain  featmes  which  are  best  suited  for  separating 
classes  (18).  To  compute  the  Karhimen-Loeve  transform  of  the  feature  set,  the  covari2mce 
matrix  of  the  feature  set  is  computed,  then  the  eigenvalues  and  respective  eigenvectors 
of  the  covariance  matrix  are  calciilated  and  arranged  in  descending  eigenvalue  order  (28). 
Dimensionality  of  the  original  feature  set  is  reduced  by  the  number  of  eigenvectors  selected. 
The  eigenvectors  selected  make  up  the  transformation  matrix  A  in  the  equation,  y  =A  x, 
where  x  is  the  original  featiure  set  and  y  is  the  new  feature  set  (18).  The  eigenv2dues  give 
the  variance  of  the  new  features  in  y  (18).  The  new  feature  space  contains  a  set  of  features 
with  the  greatest  variance  and  thus  these  features  we  considered  the  "true”  features  (18). 
The  features  with  low  vsiriances  are  considered  noise  £ind  were  eliminated  (18).  In  effect. 
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the  transformation  has  computed  a  new  orthogonal  feature  space  with  each  dimension 
corresponding  to  a  variance,  the  first  dimension  having  the  greatest  variance. 

A  Karhimen-Loeve  transform  was  computed  on  the  original  feature  set  of  441  features 
calculated  from  the  Fourier  TVansform.  The  441  dimensions  were  reduced  to  10  dimensions 
by  placing  10  eigenvectors  with  the  largest  eigenvalue  in  the  transformation  matrix,  A.  A 
new  feature  subset  consisting  of  10  features  per  sample  is  then  tested  for  classification. 

S.  7.4  Magnitude  and  Phase.  Since  so  much  information  of  zm  object  is  contained 
in  the  phase  information,  another  feature  subset  was  developed  by  computing  the  phase 
of  a  7x7  spatial  frequency  filter.  In  addition,  the  magnitude  was  computed  for  another 
feature  subset.  The  phase  was  obtained  by  computing  the  arctan  of  the  sine  term  divided 
by  the  cosine  term  for  a  specific  set  of  spatial  frequencies.  For  example,  24  unique  phase 
terms  are  calculated  from  the  49  unique  terms  generated  from  calculating  3  harmonics  of 
the  Fourier  Transform.  The  magnitude  is  obtained  by  computing  the  square  root  of  the 
sum  of  the  squares  of  the  specific  cosine  and  sine  term. 

Fotir  separate  feature  subsets  were  generated.  The  first  feature  subset  consisted  of 
24  features  per  sample  representing  the  phase  of  a  7x7  spatial  frequency  filter.  The  second 
feature  subset  contained  25  features  per  sample  representing  the  magnitude.  The  dc  term 
is  also  included.  The  third  feature  subset  contained  49  features  representing  both  the  phase 
and  the  magnitude.  The  fourth  feature  subset  contained  only  the  imaginary  components 
or  the  sine  terms.  This  subset  consisted  of  24  features  per  sample. 

3. 7.5  Add-on  Procedure.  Ideally,  it  would  be  desirable  to  test  all  combinations  of 
featmes  to  find  the  best  set  with  which  to  classify;  however,  the  amoimt  of  computation 
is  prohibitive  for  this  method,  especially  for  a  large  number  of  features.  There  are  other 
methods  which  try  to  pair  down  the  best  combinations  of  features  without  testing  each 
combination.  One  method  is  C2dled  the  add-on  procedure  (18). 

In  the  add-on  procedure,  every  feature  from  the  feature  set  is  evaluated  by  testing 
its  cleissification  ability.  The  best  individual  feature  is  then  selected  to  be  evaluated  in 
combination  with  eadi  remaining  feature.  The  best  combination  of  the  two  features  is  then 
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selected  to  be  evaluated  in  combination  with  each  remuning  feature.  The  best  combination 
of  three  features...  and  so  on  until  the  desired  subset  of  features  is  attained.  The  desired 
subset  of  features  is  the  one  that  gives  the  best  performance.  This  method  requires  only 
k(2N  +  1  -k)/2  evaluations,  where  k  is  the  number  of  features  in  the  subset  and  N  is  the 
total  number  of  features  under  consideration  (18).  The  441  features  calculated  from  the 
Fourier  Transform  is  a  large  number  to  evaluate  even  using  the  add-on  procedure.  The 
feature  subsets  described  above  are  the  feature  set  that  are  classified  using  this  addon 
procedure. 

3. 7. 6  Miscellaneous  Feature  Subsets.  A  few  additional  feature  subsets  were  gen¬ 
erated  for  testing.  In  the  case  of  the  2-class  problem,  each  harmonic  was  evaluated  indi- 
vidujJly,  with  and  without  the  dc  term.  Further,  for  the  2-class  problem,  feature  sets  were 
built  by  adding  single  features  at  a  time,  from  the  lower  three  harmonics  and  from  the 
FOM  featmes,  up  to  49  features.  Testing  w2ls  completed  after  each  addition  of  a  feature. 

3.8  Data  Normalization 

To  2K:hieve  the  feature  set’s  invariance  to  sc^de  and  displacement,  data  normalization 
is  done  (5).  This  is  accomplished  by  calculating  the  means  and  standard  deviation  for  each 
input  dimension.  Then  each  input  dimension,  for  each  sample,  is  subtracted  by  the  mean 
and  divided  by  the  standard  deviation  for  that  dimension.  The  end  result  is  data  with 
zero  meein  2ind  unit  variance  in  all  dimensions. 

It  is  interesting  to  note  that  the  F  ratio  is  the  same  whether  or  not  the  data  has 
been  normalized.  This  is  because  multiplying  a  random  variable  by  a  constant  multiplies 
its  variance  by  the  square  of  that  constant  while  adding  a  constant  to  a  random  variable 
leaves  the  vuiance  unchanged  (4).  In  the  F  ratio,  the  veuriwce  of  the  means  will  in  effect 
by  multiplied  by  1/s^,  where  s  is  the  variance  and  the  means  of  the  variance  will  also  be 
multiplied  by  1/s*.  The  1/s*  will  be  cancelled  out  and  the  F  ratio  remains  imchanged. 
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S.9  Classification 


The  cl2issification  tools  used  in  this  study  are  the  multi-layer  perceptron  and  the  knn 
classifier  as  described  in  Chapter  2  of  this  document.  A  program  8pecific£Llly  designed  for 
pattern  recognition,  called  LNKnet  (12),  was  used  for  classification  using  the  multi-layer 
perceptron  and  the  knn  classifier.  Some  modification  of  the  program  is  necessary  to  employ 
the  add-on  technique  and  to  test  the  data  set  after  each  epoch  of  training.  The  appendices 
contains  the  script  files  that  are  used  to  accomplish  this. 

S.IO  LNKnet  Parameters 

Every  test  conducted  in  LNKnet,  using  the  multi-layer  perceptron,  had  the  same 
parameters.  A  list  of  the  peurameters  is  given  below. 

•  IVaining  set:  50  random  samples/class 

•  Test  set:  50  random  samples/class 

•  Epoch  of  training:  25 

•  Number  of  hidden  nodes:  50 

•  Step  size  :  0.05 

•  Momentum  :  0.0 

•  Tolerance  :  0.2  (if  the  error  of  an  output  node  is  less  than  0.2,  the  weights  zure  not 
updated) 

•  Decay  0.0 

•  Error  Function  :  Squared  Error 

•  Output  Node  Function  :  Standard  Sigmoid 

•  Weight  update  after  each  trial 

•  Random  presentation  order 


S.ll  Conclusion 


This  chapter  described  the  methods  used  to  determine  whether  or  not  Fourier  co¬ 
efficients  computed  from  the  whole  word  are  of  value  in  classifying  hand-printed  words. 
A  description  of  the  data  set  used  and  how  the  data  was  preprocessed  was  given.  Then, 
computing  the  Fourier  TV'ansform  of  each  word  image  resulted  in  a  441  dimensional  feature 
vector  for  each  word.  All  the  feature  vectors  together  built  the  featiure  set.  From  this 
feature  set,  many  feature  subsets  were  selected.  The  feature  subsets  consisted  of  figure-of- 
merit  features,  7x7  spatial  filter  features,  combinations  of  magnitude  and  phase  featmes, 
Karhimen-Loeve  features,  and  features  from  add-on  testing.  The  next  chapter  reports  the 
results  of  2-cla8s  and  4-class  testing  using  these  feature  sets. 
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IV.  Results 

4.1  Results  of  a  2-class  problem  using  figure-of-merit  features 

To  determine  whether  using  Fourier  coefficients  as  features  for  recognition  is  worth¬ 
while,  a  simple  2-class  problem  wja  attempted.  The  two  classes  for  this  problem  axe  the 
words  ’Buffalo’  and  ’City.’  Beginning  with  the  pixelized  image  of  the  word  and  comput¬ 
ing  the  discrete  Fourier  transform  of  the  image,  the  resulting  coefficients,  both  cosine  emd 
sine  terms,  form  the  feature  set.  Fovurier  coefficients  up  to  the  tenth  harmonic  are  calcu¬ 
lated.  It  is  from  this  featvure  set  that  the  best  subset  of  features  are  sought.  In  this  case, 
figure-of-merit  features  are  used. 

A  figure-of-merit  was  computed  across  each  dimension.  In  other  words,  the  separa¬ 
bility  of  each  dimension  was  calctilated  independent  of  any  other  dimension.  A  number 
is  given  corresponding  to  it  sepsnability.  The  higher  the  number,  the  better  separation 
between  classes  for  that  dimension.  Selecting  the  features  with  the  highest  figure-of-merit 
is  how  the  feature  subset  is  built. 

The  "add-on”  procedure  described  in  Chapter  3  is  the  method  used  to  find  the  best 
combination  of  5  features  out  the  the  ten  features  with  the  highest  figure-of-merit.  So 
actuzdly,  a  subset  of  features  is  obtained  from  the  ten  features  with  the  highest  figure-of- 
merit.  Then,  from  this  subset,  the  best  combination  of  5  features  forms  the  final  feature 
subset. 

Table  4.1  lists  the  top  ten  figure-of-merit  features,  their  corresponding  figure-of-merit 
rxid  harmonic.  By  referring  to  Figure  3-5,  the  feature  number  listed  can  be  traced  to  the 
peirticular  Fourier  coefficient  calculated.  One  of  the  reasons  why  ten  harmonics  is  ejaculated 
is  to  find  out  if  einy  of  the  higher  harmonics  have  good  separability.  It  is  interesting  to 
note  that  all  but  one  of  the  highest  F-ratios  are  within  a  7x7  low  pjiss  spatial  frequency 
filter.  Considering  the  previous  success  with  using  7x7  spatial  filters  on  recognition  of 
characters,  it  is  not  smrprising  to  find  that  the  most  separable  dimensions  axe  within  the 
3  lowest  harmonics.  To  give  some  meaning  to  the  figure-of-merit  niunbers  listed  in  table 
4.1,  Figure  4.2  plots  a  histogram  of  feature  number  245  for  100  samples  from  both  classes. 
This  illustrates  the  meaning  of  the  figure-of-merit,  .32,  for  feature  number  245.  Observing 
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Table  4.1  Top  10  Most  Separable  Features,  Their  Corresponding  FOM  and  Harmonic 


Feature 

Figure-of-Merit 

Harmonic 

245 

0.32 

3 

222 

0.25 

1 

242 

0.23 

1 

178 

0.22 

2 

239 

0.21 

3 

179 

0.18 

2 

216 

0.18 

5 

220 

0.17 

1 

180 

0.16 

2 

198 

0.13 

2 
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Figure  4.1  Histogram  of  Feature  198  for  200  Samples,  100/class 


! 
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this  graph,  it  shows  that  the  feature  with  the  highest  figure-of-merit  is  not  very  separable. 
For  comparison,  4.7  shows  the  histogram  for  feature  198  with  a  0.13  fig\ire-of-merit. 
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Figure  4.2  Histogram  of  Feature  198  for  200  Samples,  100/class 

Since  all  remaining  features  have  a  lower  F-ratio  than  feature  number  245,  no  single 
feature  alone  is  usable  for  classification;  however,  the  combination  of  two  or  more  not  very 
separable  features  may  further  separate  the  classes.  The  "addon”  procedure  will  find  a 
combination  of  features  that  will  increase  the  separability. 

The  training  set  and  test  set  both  consisted  of  100  samples  randomly  selected  wd 
evenly  distributed  for  each  class.  Three  trials  were  performed  and  the  results  were  averaged. 
The  training  error  reported  corresponds  to  the  lowest  test  error.  So  when  the  lowest  test 
error  occurred,  the  corresponding  training  error  was  noted.  When  training  the  multilayer 
perceptron  it  is  possible  that  as  the  training  error  decreases  and  eventually  re2udies  zero, 
the  test  error  may  not  be  the  lowest.  Figure  4.3  illustrates  this  effect  during  a  nm.  The  test 
set  W21S  tested  zifter  each  epoch  of  training.  As  the  figure  shows,  the  test  error  decreases 
along  with  the  training  error  but  it  reaches  a  point,  at  23%  error  after  11  epochs,  where 
it  begins  to  increase  even  though  the  training  error  is  decreasing.  To  overcome  this  effect, 
testing  was  conducted  after  each  epoch  of  training  and  the  lowest  test  error  was  selected. 
Tables  4.2  and  4.3  shows  the  results  of  clcissification  testing  using  the  addon  procedure  to 
find  the  best  combination  of  5  features  out  of  the  10  FOM  features.  The  test  error  reported 
indicate  the  lowest  test  error  during  training  and  the  training  error  reported  is  where  the 
test  error  was  minimized.  The  best  combination  of  5  features  did  not  constitute  the  top 


Figure  4.3  Training  and  Testing  Error  vs  Epochs  of  Tr^ning 


5  FOM  features.  This  supports  the  notion  that  the  combination  of  2  relatively  separable 
features  may  not  be  the  best  combination  of  any  2  features.  After  the  combination  is  made 
the  separability  must  be  determined.  In  this  case,  the  classification  method  determined 
the  separability  of  the  combination  of  features  by  the  test  results.  Using  this  method,  the 
multilayer  perceptron  performed  the  best.  84  out  of  100  were  correctly  classified  using  5 
features. 

Table  4.2  Best  Combinations  of  Top  10  FOM  Features  and  the  Resulting  MLP  Test  Error 


Feature 

%  Error(testing) 

Rmserr(testing) 

%  EiTor(training) 

222 

26 

.44 

35 

222,178 

21 

.42 

26 

222,178,180 

22 

.43 

25 

222,178,180,245 

19 

.41 

21 

222,178,180,245,198 

16 

.38 

17 

4-2  Results  of  2-class  Problem  Using  49  Features  Prom  Lower  Three  Harmonics 

A  test  was  done  to  evaluate  the  test  error  per  addition  of  single  features,  beginning 
with  one  feature  and  adding  a  feature  up  to  49  features.  In  this  case  the  features  were  the 
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Table  4.3  Best  Combinations  of  Top  10  FOM  Features  and  the  Resulting  1-nn  Test  Error 


Feature 

%  Error(testing) 

Rmserr(testing) 

178 

40 

.63 

178,179 

36 

.6 

178,179,180 

30 

.55 

178,179,180,222 

28 

.53 

178,179,180,222,198 

26 

.5 

49  featiures  of  the  3  lower  harmonics.  Notice  the  test  error  drops  quickly  as  a  few  features 
are  added  then  it  reaches  a  point  where  no  improvement  in  test  error  is  achieved  with 
eulditional  features.  The  lowest  test  error  was  19.0%.  Not  only  is  it  important  to  find  good 
features,  but  it  is  just  as  important  to  find  the  right  number  of  features  that  will  give  the 
best  recognition.  Figure  4.2  shows  the  results  and  Table  4.4  lists  the  specific  featiures. 


Figure  4.4  MLP  Test  Error  vs  Number  of  Features  in  the  Lower  Three  Heurmonics 


4.3  Results  of  2-class  Problem  Using  49  FOM  features 

The  same  test  described  in  the  previous  section  was  completed  using  49  FOM  fea¬ 
tures.  In  this  case,  the  FOM  features  used  were  the  ones  with  the  highest  FOM.  The 
first  feature  corresponded  to  the  highest  figure  of  merit  and  each  additional  feature  cor- 


4-5 


Table  4.4  Listing  of  the  49  Fourier  Features  Used 


responded  to  the  next  highest  figure  of  merit.  Agztin,  the  test  error  drops  quickly  but 
reaches  a  point  where  additional  features  do  not  enhance  recognition.  Figures  4.5  amd  4.6 
shows  the  results  of  classification  testing  and  table  4.5  lists  the  specific  features.  Notice 
the  test  error  when  classifying  with  the  1-nn,  that  is,  the  test  error  reaches  a  minimum 
and  then  slowly  increases  with  additional  features.  This  supports  Dr.  Kabrisky’s  theory 
that  the  aiddition  of  more  features  only  adds  noise  and  hence,  the  recognition  performance 
is  decreaised. 


Table  4.5  List  of  the  49  FOM  features 


Num 

1 

FOM  Feature 
245 

220 

15 

285 

22 

260 

29 

207 

135 

43 

165 

2 

222 

180 

16 

201 

23 

163 

30 

247 

219 

44 

224 

3 

242 

10 

198 

17 

223 

24 

261 

31 

158 

38 

197 

45 

119 

4 

178 

11 

243 

18 

246 

25 

269 

32 

174 

39 

240 

46 

230 

5 

239 

12 

264 

19 

244 

26 

193 

33 

182 

40 

241 

47 

199 

6 

179 

13 

218 

20 

196 

27 

370 

34 

227 

41 

327 

48 

268 

7 

216 

14 

203 

21 

237 

28 

226 

35 

234 

42 

229 

49 

342 
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Figure  4.5  MLP  Test  Error  vs  Number  of  Top  FOM  Features 


Figure  4.6  1-nn  test  error  vs  number  of  top  FOM  features 


Results  of  Some  Miscellaneous  Feature  Subsets 

To  conclude  the  testing  of  the  2-class  problem,  some  miscellaneous  feature  subsets 
were  built  and  tested.  Ihble  4.6  lists  the  results.  The  imaginary  terms  alone  performed 
well  for  recognition  pxurposes  as  did  the  first  harmonic.  The  performance  was  degraded  for 
each  subsequent  hjumonic. 


Table  4.6  MLP  Test  Results  from  Various  Feature  Combinations 


Features 

%  Error(  testing) 

Rmserrl  testing) 

%  Error(trauning) 

3x5 

23 

.32 

20 

imag  oii]y(lower  three  harmonics) 

17 

.31 

14 

imag  only(Iower  three  harmonics)  plus  dc  term 

25 

.35 

16 

1st  harmonic 

23 

.31 

27 

1st  harmonic  plus  dc 

24 

.34 

18 

2nd  harmonic 

26 

.36 

21 

2nd  harmonic  plus  dc 

22 

.34 

25 

3rd  harmonic 

30 

.39 

15 

3rd  harmonic  plus  dc 

30  j 

.38  1 

13 

4.5  4‘Class  Results 

The  results  of  the  2-class  problem  show  promise  in  using  Fourier  coefficients  as  fea¬ 
tures.  Further  testing,  using  more  feature  subsets,  and  SKlditional  classes  are  needed  to 
confirm  the  recognition  capability  of  Fourier  coefficients.  The  next  set  of  resiilts  examines 
the  use  of  several  new  feature  subsets.  Two  of  the  same  sets  used  for  the  two  class  problem 
are  again  used  for  the  4-class  problem.  They  include  the  7x7  Spatial  Frequency  Filter 
and  the  10  figure-of-merit  features.  A  more  in-depth  look  at  feature  subsets  is  completed 
by  the  addition  of  magnitude,  phase,  eind  Karhimen-Loeve  features.  The  two  eidditional 
classes  include  ’Vegea’  and  ’Washington.’ 

4-6  Results  of  a  4-class  Problem  Using  FOM  features 

This  experiment  is  the  S2une  as  described  in  the  2-class  problem  using  figure-of-merit 
features.  Table  4.7  lists  the  top  ten  figure-of-merit  featmes,  their  corresponding  figure-of- 
merit  and  harmonic  for  the  4-class  data.  By  referring  to  figure  3-1,  the  feature  number 
listed  can  be  traced  to  the  particul2U'  Fovuier  coefficient  calculated.  Again,  it  is  interesting 
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Table  4.7  Top  10  Most  Separable  Features,  Their  Corresponding  FOM  and  Harmonic 


to  note  that  all  but  one  of  the  highest  F-ratios  are  within  a  7x7  low  pass  spatial  frequency 
filter.  In  this  case,  six  of  the  top  ten  are  within  the  first  harmonic.  Not  only  is  the  bulk  of 
the  information  in  the  lower  harmonics,  but  also,  this  information  is  the  most  separable. 
Tables  4.8  and  4.9  shows  the  results  of  classification  testing  using  the  addon  procedure  to 
find  the  best  combination  of  5  features  out  of  the  10  FOM  features.  The  combination  of 
five  features  correctly  classified  73  out  of  100. 


Table  4.8  Best  Combinations  of  Top  10  FOM  Features  and  the  Resulting  MLP  Test  Error 


Feature 

Rjnserr(te8ting) 

%  Error(training) 

198 

56 

.43 

58 

198,223 

45 

.40 

49 

198,223,219 

36 

.37 

38 

198,223,219,243 

32 

.33 

34 

198,223,219,243,200 

27 

.30 

24 

Table  4.9  Confusion  matrix  for  Best  Combination  of  5  FOM  features 


Results  of  a  4  class  Problem  Using  Features  from  a  7x7  Low-pass  Spatial  Frequency 

Filter 

This  section  reports  on  the  results  of  using  a  7x7  spatial  frequency  filter  on  the 
Fourier  coeflScients.  The  feature  set  consisted  of  49  terms  which  included  24  cosine  and 
sine  terms  plus  the  dc  term.  Classification  testing  was  completed  using  both  the  multilayer 
perceptron  and  the  knn  classifier.  For  the  multilayer  perceptron,  only  one  parameter  was 
changed  and  that  was  the  number  of  hidden  nodes.  A  test  was  done  with  50,  100,  and  200 
hidden  nodes  to  see  if  the  number  of  hidden  nodes  had  any  effect.  For  the  Imn  classifier 
the  only  pzurameter  changed  was  the  number  of  k  neighbors.  Four  different  neighbors  were 
used,  they  were  1,  3,  5,  and  7. 

Table  4.10  lists  the  results  of  this  set  of  testing  by  the  multilayer  perceptron.  The 
number  of  hidden  nodes  in  the  multilayer  perceptron  had  little  effect.  By  increasing  the 
niunber  of  classes,  the  recognition  performance  decreased  to  62%  correct  from  the  77% 
percent  achieved  in  the  2-class  case.  This  indicates  that  the  features  tend  to  cluster  together 
with  the  addition  of  more  classes.  Table  4.11  list  the  results  of  the  same  set  using  the  k-im 
classifier.  The  knn  classification  rate  of  26.5%  on  the  test  set  siirpassed  the  results  of 
testing  with  the  multilayer  perceptron.  Calculating  the  distance  to  additional  neighbors 
actually  degrades  the  performance  of  the  knn  classifier.  Table  4.12  list  the  confusion 
matrix  on  the  test  set  with  the  best  classification  rate  for  this  section  of  testing.  The 
references  to  the  classes  are  as  follows:  0  -  ’Buffalo’,  1  -  ’Vegas’,  2  -  ’Washington’,  3  - 
’City’.  The  recognition  rate  for  each  class  is  similar,  not  one  class  was  easily  recognizable. 
This  particular  set  of  featmres  have  discrimination  potential.  The  res\ilts  indicate  that  as  a 
rough  cut  for  classifying  words,  a  73.5%  recognition  rate  was  achieved.  Clearly,  differences 
in  hemdwritten  words  we  seen  in  the  lower  3  harmonics  of  the  Fourier  IVansform.  Finally, 

Table  4.10  MLP  Test  Results  Using  49  Features  Prom  the  7x7  Low-pass  Spatijil  Filter 


Hidden  Nodes 

%  Error(testing) 

Rmserr(testing) 

%  Error(training) 

50 

39 

.37 

12 

100 

38 

.39 

8 

200 

40 

.42 

9 

Figure  4.7  illustrates  the  test  error  as  individual  features  are  added  one  at  a  time. 
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I^ble  4.11  K-nn  Test  Results  Using  49  Features  FVom  the  7x7  Low-pass  Spatial  Filter 


B 

%  Error  (testing) 

Rmserr  (testing) 

D 

26.5 

.364 

3 

27.0 

.315 

5 

32.5 

.323 

35.0 

.339 

Ihble  4.12  Confusion  matrix  for  the  1-nn  Using  49  Feature  From  the  7x7  Low-pass  Spa¬ 
tial  Filter 


Actual 

class 

Classified 

0  12  3 

0 

34  2  7  7 

1 

2  35  13  0 

2 

1  11  37  1 

3 

9  0  0  41 

4.8  Results  of  a  4  class  Problem  Using  Features  from  a  7x7  Low^pass  Spatial  Frequency 

Filter  in  add-on  testing 

This  section  reports  the  results  of  using  add-on  testing  of  the  49  Fourier  coefficients 
from  the  7x7  spatial  frequency  filter.  A  recognition  rate  of  68.7%  was  achieved.  This  com¬ 
pares  favorably  to  the  results  of  the  figure-of-merit  features  because  three  of  the  featiu-es 
of  the  best  combination  are  the  same.  It  is  interesting  to  note  that  the  first  feature,  198, 
was  the  best  feature  in  both  figure-of-merit  and  this  test.  Featiure  198  tested  better  with 
another  feature  that  was  not  in  the  top  10  figure  of  merit  features;  however,  the  subsequent 
combinations  returned  a  lower  test  error.  This  supports  using  figure-of-merit  vs  straight 
add-on.  Table  4.13  lists  the  results. 

4‘9  Results  of  a  4-class  Problem  Using  Magnitude  and  Phase  Features 

The  next  set  of  features  used  for  classification  were  based  on  the  magnitude  imd 
phase  of  the  Fourier  Transform.  Specifically,  the  magnitude  and  phase  of  the  coefficients 
resulting  from  using  a  7  x  7  spati2d  frequency  filter.  In  addition,  a  feature  set  of  only  the 
imaginary  components  resulting  fi'om  using  a  7x7  spatial  frequency  filter  was  tested. 
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Figure  4.7  MLP  Test  Error  vs  Number  of  Features  From  The  Lower  3  Harmonics 

Table  4.13  Best  combinations  of  49  features  from  the  7x7  spatial  frequency  filter  and  the 
resulting  MLP  test  error 


Feature 

%  Error(testiug) 

Rmserrf  testing) 

%  Errorf'training) 

198 

.48 

198,260 

.45 

198.260,241 

38 

.39 

198,260,241,243 

34.5 

.36 

198,260.241,243,223 

31.3 

.35 

22 

4-9.1  Results  of  a  4-class  Problem  Using  Magnitude  and  Phase.  This  partioilar 
set  of  features  included  both  the  magnitude  terms  and  the  phase  terms.  A  tot2d  of  49 
features  were  used  for  this  experiment.  Again,  for  the  multilayer  perceptron,  only  one 
parameter  wets  changed  and  that  was  the  number  of  hidden  nodes.  A  test  was  done  with 
50, 100,  and  200  hidden  nodes  to  see  if  the  number  of  hidden  nodes  had  any  effect.  For  the 
knn  classifier  the  only  parameter  chamged  was  the  number  of  k  neighbors.  Four  different 
neighbors  were  used,  they  were  1,  3,  5,  and  7. 

Table  4.14  and  Table  4.15  show  the  results  of  this  group  of  classification  testing 
using  the  multilayer  perceptron.  The  best  error  rate  for  this  set  of  features  was  46%  for 
100  hidden  nodes  using  the  multilayer  perceptron  and  41.5%  using  the  1-nn.  Table  4.16 
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and  Table  4.17  show  the  results  of  this  group  of  classification  testing.  Each  confusion 
matrix  corresponds  to  the  best  recognition  rate. 

Table  4.14  MLP  test  results  using  49  features  from  the  magnitude  and  phase  of  the  7x7 
low-pass  spatial  filter 


Hidden  Nodes 

%  Error(testing) 

Rmserr(testing) 

%  Error(training) 

50 

50 

.41 

22 

100 

46 

.4 

21 

200 

47 

.44 

12 

Table  4.15  Confusion  matrix  for  magnitude  and  phase  using  100  hidden  nodes  in  the 
multilayer  perceptron 


Actued 

class 

Classified 

0 

1 

2 

3 

0 

24 

5 

1 

20 

1 

2 

35 

6 

7 

2 

3 

26 

17 

4 

3 

11 

6 

2 

31 

Table  4.16  K-nn  test  results  using  magnitude  and  phase  features  from  the  coefficients 
resulting  from  a  7  x  7  low-pass  spatial  frequency  filter 


k 

%  Error  (testing) 

Rmserr  (testing) 

1 

41.5 

.456 

3 

45.5 

.395 

5 

46.5 

.374 

7 

42.5 

.369 

4.9.2  Results  of  4-class  Problem  Using  Phase  Features  Only.  The  next  set  of 
features  were  based  only  on  the  phase  information  of  the  7x7  low  pass  spatial  frequency 
filter.  The  feature  set  consisted  of  24  terms  per  sample.  The  knn  performed  better  than 
the  multilayer  perceptron.  The  top  recognition  rate  of  62.0%  occurred  with  7  neeirest 
neighbors.  This  is  the  first  experiment  that  shows  a  better  recognition  rate  for  greater 
them  1  neeurest  neighbors,  in  this  case  7  nearest  neighbors  eu:hieved  the  lowest  test  error. 
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Table  4.17  Confusion  matrix  for  magnitude  and  phase  using  a  1-nn  classifier 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

26 

6 

3 

15 

1 

3 

36 

8 

3 

2 

4 

16 

28 

2 

3 

17 

1 

5 

27 

Tables  4.18  and  4.20  show  the  results  of  each  classifier  testing.  Tables  4.19  and  4.21  lists 
the  confusion  matrices  for  the  best  recognition  rates.  The  phase  contains  most  of  the 
information,  yet  it  did  not  support  very  separable  features. 

Table  4.18  MLP  Test  Results  Using  24  Features  Prom  the  Phase  of  the  7x7  Low-pass 
Spatial  Filter 


Hidden  Nodes 

%  Error(testing) 

Rmserr(testing) 

%  Error(training) 

50 

71 

.44 

58 

100 

60 

.43 

50 

200 

58 

.44 

_ — _ 

40 

Table  4.19  Confusion  matrix  for  mlp  with  200  using  phase  only  features 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

36 

1 

7 

6 

1 

23 

15 

10 

2 

2 

16 

9 

21 

4 

3 

23 

6 

10 

11 
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Table  4.20  K-nn  Test  Results  Using  Phase  Features  From  the  Coefficients  Resulting  FVom 
a  7x7  Low-pass  Spatial  Frequency  Filter 


k 

%  Error  (testing) 

Rmserr  (testing) 

1 

42.5 

.461 

3 

39.5 

.377 

5 

40.5 

.357 

7 

38.0 

.354 

Table  4.21  Confusion  matrix  for  7-nn  using  phase  phase  only  features 


Actual 

class 

Classified 

0  12  3 

0 

45  0  1  4 

1 

6  24  16  4 

2 

9  10  28  3 

3 

16  5  2  27 

4-9.3  Results  of  4-class  Problem  Using  Magnitude  Features  Only.  The  next  set  of 
features  were  based  only  on  the  magnitude  information.  The  knn  performed  much  better 
than  the  multilayer  perceptron.  The  top  recognition  rate  of  59.5%  occurred  with  1  nearest 
neighbor.  Both  the  multilayer  perceptron  and  knn  classified  ^tchieved  similar  classification 
results.  Tables  4.22  and  4.24  show  the  results  of  each  classifier  testing.  Tables  4.23  and 
4.25  lists  the  confusion  matrices  for  the  best  recognition  rates. 

Table  4.22  MLP  test  results  using  25  features  from  the  magnitude  of  the  7x7  low-pass 
spatial  frequency  filter 


Hidden  Nodes 

%  Error(testing) 

Rmserr(testing) 

%  Error  (trauning) 

50 

55 

.42 

19 

100 

42 

14 

200 

43 

.42 

12 
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Table  4.23  Confusion  matrix  for  magnitude  only  using  100  hidden  nodes  in  the  mlp 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

30 

3 

2 

15 

1 

6 

23 

15 

6 

2 

7 

11 

28 

4 

3 

9 

3 

3 

35 

Table  4.24  K-nn  test  results  using  magnitude  features  from  the  coefficients  resulting  from 
a  7  X  7  low-pass  spatial  frequency  filter 


a 

%  Error  (testing) 

Rmserr  (testing) 

1 

40.5 

.466 

3 

44.5 

.381 

5 

40.5 

.380 

46.5.0 

.388 

Table  4.25  Confusion  matrix  for  magnitude  features  using  a  5-nn  cl^lssifier 


ActueQ 

class 

Classified 

o 

1 

2 

3 

0 

26 

6 

3 

15 

1 

3 

36 

8 

3 

2 

4 

16 

28 

2 

3 

17 

1 

5 

27 
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4-10  Results  of  4 -class  Problem  Using  Imaginary  Components  Only 

The  next  set  of  features  were  based  only  on  the  phase  infomaation.  The  knn  pe- 
formed  much  better  than  the  multilayer  perceptron.  The  top  recognition  rate  of  59% 
occurred  with  200  hidden  nodes  in  the  multilayer  perceptron.  Agmn,  both  the  multilayer 
perceptron  and  knn  classified  achieved  similar  classification  results.  The  results  aure  also 
similar  to  the  results  of  the  phase  only  features.  Tables  4.26  and  4.29  show  the  results 
of  each  classifier  testing.  Tables  4.27  and  4.28  lists  the  confusion  matrices  for  the  best 
recognition  rates. 

Table  4.26  MLP  Test  Results  Using  24  Features  From  the  Imaginary  Components  of  the 
7x7  Low-pass  Spatial  Filter 


Hidden  Nodes 

%  Error(te8ting) 

Rmserr(testing) 

%  Error(training) 

50 

42 

.39 

10 

100 

43 

.41 

9 

200 

41 

.42 

10 

Table  4.27  Confusion  matrix  for  imaginary  components  only  using  mlp 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

38 

1 

2 

9 

1 

7 

25 

13 

5 

2 

8 

11 

25 

6 

3 

13 

2 

5 

30 

Table  4.28  K-nn  Test  Results  Using  Imaginary  Components  From  the  Coefficients  Re¬ 
sulting  From  a  7x7  Low-pass  Spatial  Frequency  Filter 


D 

%  Error  (testing) 

Rmserr  (testing) 

a 

44.0 

.469 

3 

45.5 

.413 

5 

44.0 

.389 

45.5 

.393 
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Table  4.29  Confusion  matrix  for  imaginary  features  using  a  1-nn  classifier 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

25 

2 

17 

6 

1 

2 

22 

25 

1 

2 

1 

7 

41 

1 

3 

17 

2 

7 

24 

4.11  Results  of  a  4-class  Problem  Using  Combination  of  KLT  Features 

The  final  feature  set  developed  was  based  on  the  Karhunen-Loeve  Transform.  In  this 
test,  ten  features  resulting  from  KL  transforming  the  original  441  Fourier  features  were 
used  in  an  add-on  procedure.  The  best  combination  of  five  was  kept  as  the  feature  set. 
The  results  of  this  testing  show  that  the  features  that  made  up  the  best  combination  were 
the  top  five  coefficients  corresponding  to  the  largest  eigenvdues.  These  features  represent 
the  orthogonal  directions  in  the  feature  space  with  the  most  variance.  Tables  4.30  and  4.31 
show  the  results  of  each  classifier  testing.  Tables  4.32  and  4.33  lists  the  confusion  matrices 
for  the  best  recognition  rates.  The  average  recognition  rate  of  76.2%  over  three  trails  for 
the  multilayer  perceptron  is  the  best  rate  for  any  feature  subset  selected.  By  performing 
the  Karhunen-Loeve  Transform  on  a  set  of  correlated  data,  generated  by  calculating  the 
Fourier  Transform,  the  new  KL  components  are  now  uncorrelated  (18).  This  improves 
recognition  performance.  Figure  4.8  plots  e2u:h  data  sample  when  the  Keirhunen-Loeve 

Table  4.30  Best  Combinations  of  Top  10  KLT  Features  emd  the  Resulting  MLP  Test 
Error 


Feature 

%  Error(testing) 

Ilmserr(testing) 

%  Error(training) 

1 

53 

.46 

64.8 

1,2 

45 

.38 

53.5 

1,2,4 

34.5 

.35 

42.3 

1,2, 4,5 

28 

.29 

23.5 

1,2,4,5,3 

23.8 

.28 

19.33 

trsmsformation  reduced  the  feature  space  to  2  dimensions.  The  patterns  are  well  mixed 
together  for  two  dimensions  and  the  resulting  test  error  was  45%. 
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Table  4.31  K-nn  Test  Results  Using  5  KLT  Coefficients 


Table  4.32 


Table  4.33 


a 

%  Error  (testing) 

Rmserr  (testing) 

1 

33.5 

.409 

3 

30.0 

.329 

5 

26.0 

.311 

24.0 

.301 

Confusion  matrix  for  best  combination  of  5  KLT  features  using  mlp  (76.2% 
accuracy) 


Actual 

class 

Classified 

0 

1 

2 

3 

0 

34 

1 

4 

11 

1 

2 

34 

11 

3 

2 

0 

3 

47 

0 

3 

12 

0 

1 

37 

Confusion  matrix  for  best  combination  of  5  KLT  features  Using  7-nn  classifier 
(76%  ax:cura.cy) 


Actusd 

class 

Classified 

0 

1 

2 

3 

0 

41 

4 

4 

1 

1 

1 

34 

15 

0 

2 

1 

2 

47 

0 

3 

15 

1 

4 

30 
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Figxire  4.8  Patterns  plotted  for  2  dimensions  of  the  KL  tramsform 
4-12  Images 

Figure  4.9  shows  a  collection  of  commonly  misclassified  patterns.  Many  of  the  pat¬ 
terns  have  some  type  of  stray  lines  in  the  image  or  the  word  itself  has  chsn-acteristics  of 
cursive  script.  A  step  in  analyzing  why  some  patterns  were  correctly  classified  and  why 
some  were  not,  may  be  to  look  at  the  reconstruction  of  the  images  with  the  featrires  used 
to  classify  them.  In  the  following  figures,  two  cleisses  were  analyzed.  For  each  class,  a 
correctly  clzissified  pattern  and  a  misclaissified  pattern  were  reconstructed  fi:om  the  lower 
3  harmonics  to  see  if  any  noticeable  difference  could  be  seen.  It  does  not  appe2U’  from  just 
a  few  images  that  there  is  any  distinguishable  difference.  However,  good  reconstruction  of 
the  image  does  not  mean  those  features  will  be  good  for  recognition. 
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Figure  4.11  Correctly  classified  pattern  in  class  0,  reconstructed  using  the  lower  three 
harmonics 
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Figure  4.12  Mis-classified  pattern  in  class  0 


Figure  4.13  Incorrectly  classified  pattern  in  class  0,  reconstructed  using  the  lower  three 
harmonics 


Figizre  4.14  Correctly  clsissified  pattern  in  class  3 
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Figure  4.15  Correctly  classified  pattern  in  class  3,  reconstructed  using  the  lower  three 
harmonics 


Figure  4.17  Incorrectly  classified  pattern  in  class  3,  reconstructed  using  the  lower  three 
harmonics 
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4- IS  Generalization  of  Recognition 


What  confidence  can  be  placed  on  the  results  obtained?  A  good  nile  of  thumb  for 
the  design  of  a  pattern  recognition  system  is  to  use  half  or  less  of  the  data  for  training 
(6).  Ail  the  result  obtained,  used  exactly  half  of  the  data  for  training  and  half  for  testing. 
Another  standard  for  the  design  of  the  system  is  to  use  on  the  order  of  ten  times  Ck,  where 
c*  =  2{K  +  1)  and  K  =  the  number  of  features  (30).  The  best  results  were  achieved  using 
5  KLT  Fourier  features,  so  in  this  case  c^  equals  12.  This  would  require  120  patterns  per 
class  for  training.  Only  50  patterns  per  class  were  used  for  training  the  system.  Ideally, 
more  patterns  are  required  for  better  generalization.  The  true  error  rate  is  estimated  to 
be  in  the  range  of  0.22  to  0.34  (19). 


4.14  Conclusion 

This  chapter  reported  the  results  of  testing  several  feature  sets  on  both  a  2-cla8s  and 
4-class  problem.  Table  4.34  summarizes  the  results  of  each  feature  set  used  for  the  4-clas8 
problem. 


Table  4.34  Summary  of  4-class  testing 


Feature  Set 

Recognition  Rate  (%) 

Classification 

7x7  Spatial  Filter 

74.5 

1-nn 

Figure-of- Merit 

73 

mlp 

Magnitude  and  Phase 

58.5 

1-nn 

Magnitude  Only 

59.5 

1-nn 

Phase  Only 

62 

7-nn 

Imaginary  coefficients 

59 

mlp 

Karhunen-Loeve 

76.2 

mlp 

The  76.2%  recognition  rate  using  the  Karhunen-Loeve  tremsform  of  Fourier  features 
was  the  top  rate.  Three  of  the  feature  sets  resulted  in  recognition  rates  of  greater  than 
70%.  Good  recognition  was  achieved  for  figure-of-merit  and  Karhimen-Loeve  features  using 
only  5  features.  The  high  recognition  rates  indicate  that  Fourier  Transform  featmes  dire 
valuable  in  recognizing  handwritten  words. 
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V.  Conclusions 


The  purpose  of  this  thesis  was  to  examine  the  use  of  Fourier  IVansform  coefficients 
for  the  recognition  of  handwritten  words.  The  pattern  recognition  problem  consisted  of 
classifying  four  handvnitten  words,  ’Buffalo’,  ’Vegas’,  ’Washington’,  ’City.’  Based  on  the 
success  of  using  Fourier  coefficients  in  the  past  for  recognizing  handwritten  letters  and 
machine-typed  words,  the  logical  next  step  was  to  analyze  the  recognition  capability  of 
Fourier  coefficients  of  handwritten  words.  The  analysis  concentrated  on  searching  for  sub¬ 
sets  of  features  from  the  Fourier  coefficients  computed  of  the  word  images.  Several  feature 
sets  were  generated  to  include  using  the  coefficients  from  the  7x7  spatial  frequency  fil¬ 
ter,  figure-of-merit  features,  magnitude  and  phase,  and  K^hunen-Loeve  features.  Two 
methods  of  classification  were  used,  the  mtilti-layer  perceptron  and  the  k-nearest  neigh¬ 
bor.  LNKnet  software  provided  the  classification  support  and  Khoros  (14)  provided  image 
processing  support. 

This  effort  was  a  first  cut  at  the  problem  of  recognizing  handwritten  words.  A 
more  in-depth  look  at  the  Fourier  coefficients  was  provided,  such  as  the  separability  of 
each  individual  feature  and  the  role  the  magnitude  and  phase  played.  The  figure-of-merit 
feature  set  resulted  in  a  73%  recognition  rate.  Using  the  magnitude  and  phase  features 
together  resulted  in  a  58.5%  recognition  rate,  while  magnitude  features  alone  resulted  in 
59.5%  and  phase  alone  resulted  in  62%  recognition.  Using  the  stemdard  lower  3  harmonics, 
which  heui  great  success  sep2irating  handwritten  letters,  produced  a  74.5%  recognition 
rate.  In  addition,  applying  the  Karhimen-Loeve  tremsform  to  further  search  for  features 
with  separability  was  exeunined.  This  resulted  in  the  top  performance  of  76.2%.  Other 
methods  of  feature  selection  included  the  addon  procedure  used  during  classification.  All 
theses  techniques  were  used  in  the  attempt  to  pair  down  the  infinite  amount  of  feature 
combination  to  find  the  best  set  of  features  for  classification.  The  result  was  a  sufficient 
examination  of  Fourier  coefficients  used  alone  as  features. 

The  results  indicate  that  Fovurier  coefficients  are  beneficial  to  some  degree  in  be¬ 
ing  able  to  classify  handwritten  words.  The  best  recognition  performance  of  76.2%  was 
achieved  when  the  Karhunen-Loeve  transform  was  computed  on  the  Fourier  coefficients. 
The  variability  in  the  hzmdwritten  word  is  difficult  for  the  Fourier  coefficients  to  overcome. 
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Although  some  recognition  ability  does  remmn,  recognizing  handwritten  words  requires 
more  than  just  featiure  sets  based  on  Fourier  coefficients. 

This  leads  to  recommendations  for  future  research  in  classifying  handwritten  words. 
The  first  recommendation  is  to  incorporate  a  feedback  mechanism  into  the  classification 
process.  The  Fourier  coefficients  provide  a  good  initial  feature  set  for  classification  and  can 
achieve  76.2%  recognition,  but  more  information  is  required  for  commercial  applications. 
Using  a  window  across  the  image  to  compute  Fourier  coefficients  may  be  a  way  to  gather 
more  information  to  feed  back  into  the  classifier.  The  second  recommendation  is  to  fuse 
the  results  of  two  or  more  independent  classifiers.  Probabilities  of  samples  being  assigned 
to  a  certmn  class  are  weighted  and  a  voting  scheme  could  be  used. 
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Appendix  A.  A  LNKnet  Helper 


This  appendix  is  a  quick  tutorial  on  the  the  program  LNKnet.  To  start  the  pro¬ 
gram  from  your  directory,  put  the  following  statement  in  the  .cshrc  file:  set  path=(  path 
/home/cub7/LNKnet/bin).  Now  you  should  be  set  to  call  LNKnet.  Do  this  by  typing 
‘LNKnet’  on  the  command  line.  A  window  will  appear  which  is  titled,  "Experimental 
Control.”  If  you  have  not  obtained  a  copy  of  the  help  manual,  do  so,  because  it  explains 
the  various  control  parameters.  The  following  tips  wiU  help. 

•  Call  LNKnet  from  the  directory  that  is  one  directory  above  where  the  data  you  want 
to  use  is  located.  For  example,  if  you  have  the  data  files  called  xor.trzdn  and  xor.test 
located  in  the  directory  called  Inknet/data,  then  go  to  the  Inknet  directory  on  the 
conunemd  line  and  call  up  LNKnet. 

•  The  top  of  the  experimental  control  lists  the  parameter  ALGORITHM.  This  allows 
you  to  choose  a  classification  method. 

•  Directly  below  ALGORITHM  is  Algorithm  Params...  in  which  you  can  define  the 
specific  parameters  of  the  classification  technique. 

•  Under  FILE  NAMES:  Just  enter  a  name  for  Exper.  Name  and  hit  return  and  the 
other  file  names  will  be  generated  automaticedly.  For  Exper.  Path  enter  ‘data/’ 
followed  by  a  carriage  return  if  the  data  is  set  up  the  same  way  as  described  in  the 
first  hint.  All  the  files  generated  when  the  program  runs  will  be  put  in  the  directory 
designated  by  the  Exper.  Path. 

•  For  STANDARD  DATA  SET,  enter  NONE,  unless  you  want  to  use  the  standard 
data  sets  found  in  the  home/cub7/LNKnet  directory. 

•  For  Data  Path,  enter  the  directory  the  data  is  foimd  in  ie.,  ‘data’  for  the  example 
above.  For  data  file  preiix  enter  ‘xor’  for  the  example  above.  It  will  search  for 
xor .train  and  xor.test.  The  number  of  features  and  output  classes  is  obvious. 

•  Then  go  to  the  left  side  of  the  experimental  control  and  enter  the  number  of  training 
patterns  zuid  test  patterns.  Click  on  the  box  next  to  the  desired  function  whether  it 
be  train,  test  on  training  data,  eval,  or  test. 
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•  At  this  point  hit  START  and  you  are  on  your  way.  The  results  will  be  outputted  to 
the  conunand  tool  and  also  to  the  log  file. 

•  When  entering  p£urameters,  be  sure  to  hit  a  carriage  return.  The  following  is  a  script 
file  which  will  run  LNKnet’s  multilayer  perceptron,  testing  2dter  ezwrh  training  epoch. 
Depending  on  the  particular  problem  you  have,  several  parameters  will  need  to  be 
changed.  The  man  pages  for  LNKnet  list  all  the  peurameters. 

•  LNKnet  needs  a  certain  format  for  the  data  file.  Separate  each  pattern  by  a  carriage 
return  and  separate  each  feature  by  a  space.  To  indicate  the  class  of  each  pattern, 
pl2u;e  an  integer  before  the  first  feature  of  each  pattern.  For  example:  0  24.3  35.6 

. cr.  The  first  class  begins  with  a  zero  and  so  class  2  would  begin  with  a  1  and  so 

on. 

That  is  a  quick  summary  of  the  basics  you  will  need  to  run  the  prograun.  Most 
run-time  errors  occur  because  the  program  can’t  find  the  data  files  or  the  data  files  are  in 
the  wrong  format. 

When  LNKnet  is  executed,  it  runs  a  series  of  script  files  which  is  puts  in  the  log 
file.  These  script  files  can  be  edited  to  fit  your  desires.  For  example  if  you  want  to  test 
after  each  epoch  of  training,  generate  a  script  file  to  do  this.  The  following  script  file  is  an 
example  of  how  this  is  done. 


ti/bin/csh 
f tr 8/2c03mlp . run 
■et  loc*‘pvd‘ 
sat  epochs. laft  ■  26 
(time  mlp  \ 

-train  -craata  -pathaxp  tloc  -terror  2c03Blp. err- train  -fparaa  2c03Blp.paxaD\ 
-tpid  2c03Blp.pid  -pathdata  /tDp.sint/hoBa/ha«kays7/gshartla/lnknat/ftrs/\ 

-f input  test. train  -fdascriba  test .train. dafaultsX 

-nrav  10  -npattams  18  -normaliza  -fnorm  tost. norm. sis^laX 

-cross.valid  0  -fcross.valid  tost. train. cv  -random  -sood  0\ 

-priors.npattems  18  -debug  0  -verbose  3  -verror  2  -nodes  10,60,2\ 

-alpha  0  -atta  0.1  -epsilon  0.1  -kappa  0.01  -decay  0  -tolerance  0.2\ 
-hfunction  0  -ofunction  0  -param  1  -criterion  0  -epochs  1  -batch  1,1, 0\ 

)  \ 

It  nn.toa  -h  2c03ffilp.log 
•  test  alter  first  epoch 
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(tia*  alp  'crMta  \ 

-patkasp  tloc  -farror  2c03alp.arr.taat  -fparaa  2c03alp.paraa\ 

-fpid  2e0telp.pid  -patkdata  /tap.aBt/hoaa/haakayaT/gakartla/liikiiat/ftraX 
-fi^at  taat.taat  -fdaaeriba  taat.taat .dafanlts  -araa  10\ 

-npattaroa  18  -noraaliaa  -faora  taat .aora.sii^la  'croaa. valid  0\ 

■fcrosa. valid  taat.taat. ev  -raadoa  -aaad  0  ~priora.iq>attama  18\ 

-dabag  0  -varboaa  3  -varror  3  -aodaa  10,60,2  -alpha  0.7  -atta  0.05\ 
-apailaa  0.1  -kappa  0.01  -dacay  0  -tolaranea  0.2  -hfnnction  0\ 
-otaactioB  0  -paraa  1  -critarion  0  -apocha  26  -batch  1,1,0  )  \ 

Ik  BB_taa  -h  -a  2c03alp.lag 

•  apocha.laft  -■  1 
ahilaCtapocha.laft  >1} 
acho  lapocha.laft 

•traia  aach  raaainiiig  apoch 
(tiM  alp  \ 

-traia  -pathaxp  tloc  -farror  2c03alp.arr .train  -fparaa  2c03Blp.paraa\ 
-fpid  2c03Blp.pid  -pathdata  /tap.nat/hoBa/haakayaT/gahartla/lnknat/ftra/V 
-fiapat  taat. train  -fdaacriba  taat. train. dofanltaX 
-nraa  10  -npattama  18  -noraaliza  -fnom  a.nota.BiBpla\ 

-croaa. valid  0  -fcroaa. valid  taat. train. cv  -randoa  -aaad  0\ 
-priora.npattama  18  -dabng  0  -varboaa  3  -varror  2  -aodaa  10,60,2\ 
-alpha  0  -atta  0.1  -apailon  0.1  -kappa  0.01  -dacay  0  -tolaranea  0.2\ 
-hfnnetlon  0  -ofunetion  0  -paraa  1  -critarion  0  -apocha  1  -batch  1,1, 0\ 
)  \ 

Ik  Bn.taa  -h  -a  2c03Blp.log 

•  taat  aftar  aach  apoch  of  training 
(tiaa  alp  \ 

-pathaxp  tloc  -farror  2c03Blp. air. taat  -fparaa  2c03Blp.paraB\ 

-fpid  2c03Blp.pid  -pathdata  /tnp_Bnt/hona/haBkaya7/gahartla/lnknat/ftra\ 
-fiiqpnt  taat.taat  -fdaacriba  taat. taat. dafanlta  -nrav  10\ 

-npattama  18  -noraaliza  -fnom  taat.nom.aiapla  -croaa.valid  0\ 
-fcroaa.valid  taat. taat. cv  -randoa  -aaad  0  -priora.npattama  18\ 

-dabng  0  -varboaa  3  -varror  2  -nodaa  10,50,2  -alpha  0.7  -atta  0.05\ 
-apailon  0.1  -kappa  0.01  -dacay  0  -tolaranea  0.2  -hfunction  0\ 

-of auction  0  -paraa  1  -critarion  0  -apocha  25  -batch  1,1,0  }  \ 

Ik  nn.taa  -b  -a  2c03alp.log 

•  apocha.laft  -*1 
and 

•train  to  got  any  raaaindar  in  tha  nnabar  of  apocha 
(tiaa  alp  \ 

-traia  -pathaxp  $loc  -farror  2e03Blp.orr .train  -fparaa  2c03alp.paraB\ 
-fpid  2c03mlp.pid  -pathdata  /tmp.Bnt/hoaa/havkaya7/gahartla/lnkaat/ftra/\ 
-flnpat  taat. train  -fdaacriba  taat. traia. dafanltaX 
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-»•«  10  -apattama  18  -aoraaliza  -faorm  tast.aora.alaplaV 
-eroaa.valid  0  -f cross. valid  tast.trala.ev  -raadoa  -saad  0\ 
■prlors.npattoms  18  -dobug  0  -vsrbosa  3  -vsrror  3  *aodss  10,60,2\ 

-alpha  0  -atta  0.1  -apsllon  0.1  -kappa  0.01  -dacajr  0  -tolaraaca  0.3\ 
-kfoBCtioa  0  -ofuactioa  0  -paraa  1  -eritarioa  0  -apochs  1  -batch  1.1, 0\ 

)  \ 

Ik  nn.taa  -h  -a  3c03alp.log 
8  do  final  tast 
(tiaa  alp  \ 

-pathazp  Hoc  -farror  3c03alp.arr.taat  -fparaa  3c03alp.paraa\ 

-fpid  3c03alp.pid  -pathdata  /tap.ant/hcaa/haskayaT/gshartls/lakast/ftrsX 
-f input  tast. tast  -fdaaeriba  tast. tast .daf suits  -arau  10\ 

-npattams  18  -nomalisa  -fnom  tast.nom.sl^>la  -cross. valid  0\ 

-f cross. valid  tast. tast. cv  -randoa  -saad  0  -priors.i^ttams  18\ 

-dabttg  0  -varbosa  3  -varror  2  -nodas  10,50,2  -alpha  0.7  -atta  0.05\ 
-apsllon  0.1  -kappa  0.01  -dacay  0  -tolaraaca  0.2  -hluaction  0\ 

-otunction  0  -param  1  -critarion  0  -apochs  26  -batch  1,1,0  )  \ 

Ik  nn.taa  -h  -a  3c03Blp.log 

•  plot  rasults  of  Tasting,  changa  farror  to  plot  training 
plot. parr  -pathazp  Hoc  -farror  2c03Blp.arr.tast  -fplot  2c03Blp. parr .plot  \ 
-autoscala  -min  0  -max  10000  -ysin  0  -yaaz  100  -zstap  1000  -ystap  lOV 
-llna.typa  1  -trials  36\ 

-titla  "HoTB-.SiBpla  llot:10,S0,2  Stap:0.1  HomaniO.e" 
acho  "currant  diractory:"  »  2c03Blp.log 
acho  Hoc  »  2c03mlp.log 
plot  rms  arror 

plot.msarr  -pathazp  tloc  -farror  3c03Blp.arr.tost  -fplot  3c03Blp.ruorr .plot  \ 
-autoscala  -min  0  -znaz  10000  -ynin  0  -ynaz  100  -zstap  1000  -ystap  10\ 
-lina.typa  1  -trials  76\ 

-titla  "Horn : Sinpla  Not-.10,60,2  Stap;0.1  HoBon:0.6’' 
acho  "currant  diractory:"  »  2c03nlp.log 
acho  Hoc  »  2c03Blp.log 
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Appendix  B.  Sourcecode 


B.l  Scriptfiles 

9  Prograa  to  find  soloctod  citios  in  tho  cd  ron 
•!/bin/cih  -f 

HIPSDIR  /cdrom/trnin/citias/bd 
■•t  ontdir  >  tBOH£/cedar/dnta/trnin 
••t  C  a  0 
aat  H  a  0 
sat  V  a  0 
nnalias  Is 
cd  IHIPSDIR 
touch  '/tamp/hips . junk 
foraach  J  (bd*) 
cd  Ij 
acho  tj 

aat  la  'la  -a  bd*.*' 
foraach  i  (  tl  ) 
acho  ti 

*/oadar/bin/daltau  <  «BlPSDIR/tj/|i  1  '/cadar/bin/hdinfo  I  grap  Truth  >  '/taap/ junk. hips 
sat  a  a  <ia  '/taap/junk. hips' 

sat  yn  a  ‘grap  'Inaga’  ta  I  auk  ’/[Vv] [Ea] [Gg] [ia] [Ss]/  {print 
if  (  lyn  aa  "y"  )  than 

*/cadar/bin/daltau  <  $i  I  ■/cadar/bin/hips2sun  >  toutdir/i/$i ; 
sat  V  a  (‘acho  "$V  1  ♦  p"  I  dc') 
acho  2,'"'  >>toutdir/taaip/classos 
sot  ka'grop  Imago  ta' 

acho  tk"  "  ti . tally  «  "tV,  "  class  a  i",  »toutdir/tomp/tally 

also 

andif 

set  yn  a  "n" 


sat  yn  a  ‘grap  ’Imago’  ta  I  auk  ’/[Wu]  [Aa]  [So]  [Bh]  [liHNu]  [Gg]  [Tt]  [Go]  [Bn]/  {print  "y"  }” 
if  {  tyn  a«  "y"  )  than 

'/cadar/bin/deltau  <  ti  I  '/cedar /bin/hip828un  >  tootdir/2/ti : 
sat  W  a  (‘echo  ”tW  1  +  p"  1  dc') 
acho  3,""  >>toutdir/tamp/classas 
sat  ka'grop  Image  ta' 

acho  tk"  "ti"  "  "tally  a  "tW,  "  class  a  2"»tontdir/tamp/tally 
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•Is* 

•ndif 

s«t  yn  •  "n" 


sat  yn  ■  ‘gr«p  ’laaga’  la  I  ask  ’/[cC] [li] [TtDlYyD/  {priat  "y"}'' 

11  (  lyn  ■■  "y"  )  than 

*/cadar/bla/daltaD  <  11  I  */cadar/bin/hips3sun  >  toatdir/3/ti; 
sat  C  ■  Cacho  "1C  1  ♦  p"  I  dc') 

•cho  4,""  >>toutdir/tamp/classaa 
sat  k>*grap  Imaga  |a‘ 

•cho  Ik"  "11"  "  "tally  -  "IC,  "  class  -  3"»|outdir/taBp/tally 

•Isa 

•ndif 

sat  yn  »  "n" 


and 
cd  .  . 
and 

•  Program  to  praprocass  tha  image 

*!/bin/csh  -1 
•foraacb  1  <b*.[0-9]) 

t  WindoB  tha  binarized  image  and  do  the  dlt 

•  convert  the  raster  file  of  the  image  to  viff  format, 
t  do  a  histogram  strech  and  threshold. 

tiff2viff  -i  li  -o  /usr/tmp/two  -v  0 

trast2viff  -i  $i  -o  /usr/tmp/one  -p  0 

Sputimage  -i  /usr/tmp/ono  -update  2 

•vhstr  -i  /usr/tmp/one  -o  /usr/tmp/two  -p  0 

vthresh  -i  /usr/tmp/two  -o  /usr/tmp/three  -1  235  -v  266 

tputimage  -i  /usr/tmp/three  -update  2 


viff2rast  -i  /usr/tmp/three  -o  test  -p  0 
rast2vilf  -i  test  -o  /usr/tmp/rast2HAAal4313  -p  0 
viff2pbm  -i  /usr/tmp/rast2HiAal4313  -o  ti.asc  -r  0 
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•  convart  tlia  binarizad  ia«ga  back  to  aac. 

•  crop  tho  imago,  and  do  a  dft 

•vill2pba  -i  /usT/tmp/thzao  -o  ti-aac  -r  0 
•'/bin/ crop  $i.aac  ti.crop  aoz 
'/bin/dlt.all  ti.aac  bin.sin.ftra/ti.allpbo 
t'/bin/dlt.all  $i.crop  bin.oin.ftra/li .allpbo 
ra  ti.aac 
•rm  ti.crop 

•  got  tho  croppod  dimonaiona  from  tho  .aox  filo 

•aot  dima  ■  'hood  -2  aux' 

•ocho  tdima[l]  ,tdiaa[2]  ,  tdimaCS],  tdimaC4] 

•  Hindoo  tho  original  imago  and  do  tho  dft  to  got  tho  loatnroa 

•  crop  tho  original  imago 
•putimago  -i  /uar/tmp/ono  'updato  2 

•voztract  -i  /uar/tmp/ono  -o  /uar/tmp/cut  -z  tdiBia|[2]  *y  tdimaCl]  -o  tdima[31  -h  tdimaCAl 
•putimago  *1  /uar/tmp/cut  -updato  2 

•  convort  from  viff2pbm,  run  tho  dft  routino  to  got  tho  foaturoa  from  tho  unbinarizod  iaMgoa 

•viff2pbm  -i  /uar/tmp/cut  -o  ti.aac  -r  0 
•'/bin/dft.all  ti.aac  oin.ftra/ti.allpo 
•rm  ti.aac 

and 


•  Program  to  do  addon  tooting 

•!/bin/cah  -f 


ocho  ""  >  roault.mlp.49 
ocho  ""  >  ro8Ult.mlp.49.avgte8t 
ocho  ""  >  ro8Ult.mlp.49.avgtrain 
ocho  >  ro8Ult.mlp.49.avgrm8te8t 
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•cho  **"  >  result . Blp. 49 .Avgrastrain 
Aclio  >  faatur* 

•cho  faaturaa”  "trial*”  "X*rror,t*«t"  "Xarror.train"  "rM.arr.taaf •  "rM.arr.train"  "Biiaa*  »  raault.Bap.49 

acho  ""  »  raault.Blp.49 

aat  baatlaatura  ■  *  acho  *  0  0  0  0  0  * ‘ 


foraach  j  (12346) 
if  (tj  !■  1)  than 

aat  baat  ■  ‘auk  ’{print  ll}>  faatura' 
acho  tbaat 

aat  k  •  ‘acho  "9j  1  -  p"  |  dc‘ 
BhilaCtk  >0) 

aat  baatfaaturaClk]  =  IbaatMk] 

•  k  —1 

and 

andif 

acho  tbastfaature 

acho  "  "  >  test.faaturas 


foraach  i  (221  222  220  199  243  201  241) 
acho  ti.lj 

if  (IbaatfaatureCl]  !«  ti  tt  tb*8tfeatura[2]  !>  $i 
At  $be8tfeataraC3]  !>  $i  At  IbaatfaatureCd]  !>  $i 
AA  $beatfaaturo[5]  !•  $i)  than 


acho  ""  »  ra8ult.mlp.49 

aat  avgtast  >  0 
aat  avgtrain  >  0 
aat  avgrmateat  «  0 
aat  avgrmstrain  =  0 
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■•t  trial  ■  1 


gat.aach.faatur*  4caaBpa  .aomid  looitraat  400  441  ti 
rand4c  loaftrsat  taat. train  taat.taat  400  tj  list 
'/Inknat/ftrt/shooshS.ran  tj 


•  sat  up  a  fils  callad  arror  to  writs  all  tha  rasnlts  of  tasting  to. 

sat  z  ■  'sad  a/"(  "/"("/  3c03mlp.log|  grap  Ovarall  I  awk  *{printt4,t6}“ 
•sat  zl  ■  ‘sad  s/"(  ScOSalp.logl  grap  Ovarall  I  awk  ’-{printld.tS}" 

acho  "  "  >  arror 
acfao  ttx  »  arror 
acko  tz  >>  arror 

•  gat  tks  lowast  arror 
'/bin/gatarror 


sat  f  s  ‘  awk  ’{print  tj}’  laaturs' 
sat  zl  =  ‘  awk  ’{print  $0}’  arror.raport ' 
sat  yl  ■  ‘  awk  ’{print  10}’  Disclasslist* 
sat  zl  a  ‘  acho  ttyl' 

acho  tf  »  rssult_mlp_49 

acho  ti"  "ttrinl"  "tzl[l]"  "txl[2]"  "tzlC3}"  ”txl[43"  "  »  rasult.nlp.49 
•acho  ti"  "ttrinl"  "tzlCl]"  "txl[2]"  "  »  rasuit.nlp.49 

whilsCtzl  >  0) 

acho  "  "tylCtzl]  >>  rasult.mlp.49 

«  zl  —1 

and 
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r«nd4c  foaitrsst  tast. train  taat.taat  400  tj  list 
*/lnknat/ttrs/«hooslk3.run  $j 


sat  x«‘sad  a/"(  •7"(’7  3e03alp.logl  grap  Ovarall  I  avk  ’{printM.IO}’ ‘ 

•sat  x2  ■  ‘sad  s/'’(  "/"(”/  3c03sap.log|  grap  Ovarall  I  ask  >{printt4,l6}’ ‘ 

acho  "  "  >  arror 
acho  ttx  i>  arror 
acho  tx  >>  arror 
sat  trial  *  3 
'/bin/gatarror 

sat  x3  '  ‘  ask  ’<print  $0}’  arror, raport ‘ 
sat  j2  a  ‘  ask  '<print  to}'  atiselasslist ‘ 
sat  z2  a  '  acho  tty2‘ 

acho  tf  >>  rasult.alp_49 

acho  ti"  "ttrial"  "$x2[l]"  "tx2[3]"  "Ix2[3l"  "tx2[4]"  "  »  rasult_nlp.49 
•acho  ti"  "ttrial"  "tx2[l]"  "tx2[2]  »  rasnlt.Bap_49 
shilo(tz2  >  0) 

acho  "  "ty2Ctz2]  »  rasnlt.mlp.49 

•  z2  -ai 

and 


rand4c  fonftrsat  tast. train  tast.tast  400  tj  list 
'/lnknat/ltrs/shoosh3.run  tj 


sat  xa'sad  s/"(  "/"("/  2c03mlp.log|  grap  Ovarall  I  ask  ’{printtt.te}' ‘ 
sat  x3a‘sad  8/"{  "/"("/  2c03mlp.log|  grap  Ovarall  |  ask  '-(printt4.t6}* ‘ 

acho  "  "  >  arror 
acho  t*x  >>  arror 
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•cho  tx  »  arror 
••t  trial  ■  3 
■/bin/gatarror 

aat  z3  ■  '  avk  ’{print  t0>'  arror.raport ' 
sat  y3  ■  ‘  avk  ’{print  $0}'  nisclaasliat * 
aat  s3  ■  *  acbo  t*y3* 

aebo  tf  »  rasalt.Blp.49 

acho  li"  "Itrial"  "•x3[l]"  ”lx3[2]"  "Ix3[3]"  -|x3[4]"  "  »  rasnlt.mlp.49 
•aeho  ti”  "»trial"  "♦x3[l]”  ”9x3[2]-  "  »  rasnlt.mlp.49 

Bhila(tx3  >  0) 

aebo  ”  "tySCtzS]  »  rasalt.Blp.49 

•  z3  —1 

and 

sat  avgtast  -  'acho  •’2  k"  lxl[l]  «x2[l]  »x3El3"++  3  /  p”|dc‘ 
sat  avgtrain  ■  ‘acho  "2  k"  $xlC2]  9x2C2]  Ix3[2]"++  3  /  p"tde‘ 
sat  avgrmstast  ■  ‘acho  "2  k"  9x1  [3]  9x2 [3]  9x3[33"'»«  3  /  p’'lde‘ 
sat  avgrmstrain  »  ‘acho  "2  k"  9xl[43  9x2[43  9x3[43”++  3  /  p"ldc‘ 

aeho  >>  ra8nlt.mlp.49 
aebo  ""  »  ra8nlt.Blp.49 

acbo  "  "avg"  "gavgtast"  "kavgtrain"  "9avgrastast"  ''9avgrBstrain  »  ra8alt.Blp.49 
acho  ""  >>  raaalt.Blp.49 
acho  »  ro8alt.Blp.49 

acho  9i"  ”9avgta8t  »  reaalt.Blp.49.avgt08t 
acho  9i"  ”9avgtraja  »  ra8alt.mlp.49.a7gtraiB 
acho  91”  '*9avgrB8ta8t  »  ra8alt.Blp.49.aTgzmsto8t 
acho  9i"  "9avgrB8traln  »  ro8alt.Blp.49.avgZBStrain 

acho  91"  ''9avgtost  »  taat.faatoroa 


rm  2c03alp.log 
rm  2c03Blp.arr .tost 
andif 

and 
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*/bla/*ort.b«at_fMtiur« 


•nd 


•  Prograa  to  find  tha  ■iaclassifiad  saaplas 

•!/bin/csh  -i 

•  raad  in  tba  Biaclasaifiad  aamplaa 

sat  nnnbar  >  'ask  ’{print  tl}’  Biss’ 
sat  dasirad  >  ‘auk  ’{print  $3}’  miss’ 
sat  classifiad  ■  'auk  ’{print  t6>’  Biss' 

•  writs  than  to  a  fila 

aebo  >  Bissl 
aebo  ttnnmbar  >>  missl 
aebo  tttOBbar  »  Bissl 
aebo  tdasirad  »  aisal 
aebo  telassifiad  >>  misst 


•  it  calls  tba  program  miss.pattams  to  gat  tba  aisssd  pattams 

•  and  find  tba  filanams  of  the  pattern  so  it  ean  display  it. 

Biss.pattams 

tsat  pattams  ■  ’auk  ’{print  $1}’  Bisclasslist’ 
tforaaeh  i  (tpattsms) 

•rast2viff  -i  eadar/data/train/all.pattams/ti  -o  /nsr/tap/li  -p  0 

•putiaaga  -i  /nsr/top/li  -update  2 

tend 
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B.2  C  code 


Compute  the  2dft  of 
a  NxM  arraj'  of  vaiues 


This  pTogTum  reads  an  ascii  £le 
that  has  been  generated  by 
converting  a  raster  die  to  pbm  in 
khorus. 


It  reads  every  third  value  in  the 
array  to  account  for  the  fact  that 
the  conversion  was  greyscale. 

***««4>*************«*******«***««****««****«*»*y 


#include  <stdio.h> 

^include  <inath.h> 

main  (argc.  argv) 
int  ugc; 
char  ♦argvQ; 

{ 

/*♦**  trueJieight  is  the  real  array  dim  in  y  space 
true.width  is  the  real  array  dim  in  x  space 

int  trueJieight,  true.width,  ORDER,  FILTER,x,y4j,k,l.waste2; 
float  high,  across,  tempk,  tempi,  cossin.term,  Dorm.dc.term; 
float  coeff[21][21]; 
int  cityjiame[1000J(1000]; 
int  data[1000][1000],  junk; 
char  waste; 


FILE  ♦inJile,  ♦out.file; 

if  ((inJile=fopen(argv[l],  "r"))==NULL) 
printf(" can't  openXn"); 


outJile  =  fopen(argv[2|,"w"); 


/♦♦  read  header  info  *♦♦/ 

fscanf  (in.file,"iCB",  &waste); 
fscctnf  (inJile,"%d",  &true.width); 
printf("  %d  ", true.width); 
fscanf  (injile,"%d",  &trueJieight); 
printf("  Xd  ",trueJieight); 
if  (true.width>1000) 

printf("ont  of  bonndsXn"); 
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if  (trueJ>eight>1000) 

printfC'ont  of  boutds"); 

&canf  (in^e,"Xd",  &waate2); 

/**  read  in  the  matrix  of  values,  every  third  value  •**/ 

for  (x=0;  x<trueJieight;  x++) 

{ 


for  (y=:0:  y<true-width;  y++ ) 

{ 

fscrmf  (in^e,"Xd  Xd  Xd",  &data(x][y],  &junk.  jcjuitk); 
city  jiame[x]  [y] =data(x]  [y  ] ; 

} 


high  =  2*M  J*I/(true_height); 
across  =  2*M  J*I/(true.width): 


FILTER  =  ORDER*2-hl  where  order  is  the  number  of  harmonics  ***j 

ORDER  =  10: 

FILTER  =  ORDER»2+l; 

for  (k=0;  k<FILTER:  k++) 
for  (1=0;  KFILTER;  1++) 

coeff{k][l]  =  0.0: 


for  (k=0;  k<ORDER;  k++) 

for  (1=0;  1<FILTER;  1++) 

{ 

tempk  =  (k— ORDER)*high; 
tempi  =  (1— ORDER)*acro88; 


for  (i=0;  i<trueJieight;  i++) 

for  (j=0;  j<(tnie-width);  j++) 

cossin.term  =  — i*tempk— j*templ; 

coeff(k][l]  +=  citymame[i][j]  *  cos(co8sinjterm); 

coeff(20— k][l]  +=  cityjiame[i][j]  ♦  sin(cos8iQ.term); 

} 
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} 

k  =  ORDER; 

for  (1=0;  KORDER;  1++) 

{ 

tempi  =  (1— ORDER)  *across; 

for  (i=0;  KtrueJieight;  i++) 
for  (j=0;  j<true.width;  j++) 

{ 

cossin-term  =  — j*templ; 

coefFpclIl)  +=  cityjiame[i][j]  *  co8(cossinjterm); 

coeff|kl[20— 1]  +=  citymame[i][i]  ♦  8in(cossinjtenu); 


} 


} 


dc.term  =  0; 

for  (1=0;  i<trueJteight;  i++) 
for  (j=0;  j<true-width;  j++) 

dc.term  +=  cityjiame[i)[j]; 
priiitf(  "Xf  ",dc.term); 
coeff[ORDER]  (ORDER] =dcJ;erm; 


Z**************  ENERGY  NORMALIZE  *♦*»♦♦*♦**♦****♦»♦»***</ 

norm  =  0.0; 

for  (k=0;  k<FILTER;  k++) 
for  (1=0;  1<FILTER;  1++) 
norm  +=  coeff|k|[l|*coeff(k|[l]; 

norm  =  8qrt(norm); 


for  (k=0;  k<FILTER;  k++) 
for  (1=0;  KFILTER;  1++) 
coeff{k](l]  =  coeff[k][lj/norm; 

/t<**i>**********  WRITE  COEFFICIENTS  TO  FILE  ♦•♦♦*****»**Y 
for  (k=0;  k<FILTER;  k++) 
for  (1=0;  KFILTER;  1++) 

fyrintf(out.file,"Xf  \n"  ,coeff(k]  [1] ) ; 


fclo8e(in.£le); 

fclo8e(out.file); 

)  /♦*♦  END  MAIN  *♦♦/ 
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Progr&m:  crop.c 


Description:  Crops  a  binarized  image 

*«***««***«***«««****«**«*****«*************•**•/ 

^include  <8tdio.h> 

#include  <matb.h> 

main  (argc,  argv) 
int  argc; 
char  *argv[]; 

{ 


/*«♦♦  trueJieJgbt  is  the  reaJ  array  dim  in  y  space  *♦» 

true.wJdth  is  the  reaJ  array  dim  in  x  space  **♦•*♦*/ 

int  trueJieight,  true.width,  ORDER,  FILTER,x,y,iJ,k,l,waste2; 
int  citymame[1000][1000]; 
int  data[1000] [1000],  junk; 

int  top,  bottom,  leftjside,  right^ide, width,  height; 
int  countsO,  val; 
char  wastefS); 

FILE  *in.file,  *out^le,  «outJile2; 

if  ((in^e=fopen(argv|l],  "r"))==NULL) 
printfC'can't  open\n"); 


outJile  =  fopen(argv[2],"w"); 
outJile2  =  fopen(argv[3j,"w"); 


fscanf  (inJile,"X8",  waste); 
fscanf  (in.file,"Xd",  &true.width); 
printfC  Xd  ",true-width); 
fscanf  (in_file,"Xd",  &trueJieight); 
printf("  Xd  ",trueJieight); 
if  (true-width>1000) 

printf("ont  of  bonndsXn"); 
if  (trueJieight>1000) 

printf("out  of  bounds"); 
fscanf  (inJile,"Xd",  &w2iste2); 

/»♦  read  in  the  matrix  of  values,  every  third  value  ***/ 
top  =  —1; 

for  (x=0;  x<trueJieight;  x++) 

{ 
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for  (y=0;  y<true.width;  y++) 

{ 

fecanf  (iQ^e,"Xd  Xd  Xd",  &data[xl(y].  &jimk.  isjunk); 
city  jiame(x]  [y] =data[xl  [y] ; 

if  ((top  ==  —1)  Si&i  (cityjiamelxJly]  ==  0)  &&  (y  ^  0)) 
top  =  x; 

} 

} 

leftside  =  —  1; 

for  (y=l;  (y  <  true.width)  &&  (left  jide==-l):  y++) 
for  (x=top;  X  <  trueJieight;  x++) 
if  (cityjiame(x](y]  ==  0) 

{leftside  =  y;  break;} 
bottom  =  —  1; 

for  (x=true-height-l;  (x  >  top)  &&  (bottom  ==  -1);  x — ) 
for  (y=left^ide;  y  <  true.width  ;  y++) 
if  (cityjiame[x][y]  ==  0) 

{bottom=x;  break;} 

right.side  =  —  1; 

for  (y=true.width— 1;  (y  >  left.side)  &&  (rigkt.side  ==  —1);  y— 
for  (x=top:  X  <  bottom;  x++) 
if  (cityjiame[x)(y|  ==  0) 

{right.8ide=y;  break;} 

width  =  rightjside  —  left.side  +  1; 
height  =  bottom  —  top  +  1; 

fprintf(out.file,"X8\nXd  Xd\nXd\n",w^k8te,  width,  height,waste2); 
fprintf(outJile2,"Xd  Xd  Xd  Xd",top,leftjBide,width.height); 
for  (x=top;x<bottom;x++) 
for  (y=leftj5ide;  y<rightjside;y++) 

{ 

val  =  citymame[x|[y]; 

fprintf(outJile,"X3d  X3d  X3d  ",  val,  val,  val): 
if  (++couiit  ==  4) 

{ 

fprintf("\n"); 
count  =  0; 

} 

} 

fclose(inJile); 

fclose(outJUe); 

} 

^include  <stdio.h> 

#define  SQR(x)  (x)*(x) 

A 
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«  normalize:  normalize  input  features  based  on  naamplea  or  (nsamples-1) 

*  samples 

* 

«  inputs: 

*  data:  nsamples  x  dim  array  of  data-eacb  row  is  a  sample 

*  nsamples:  number  of  training  samples 

*  dim:  number  of  elements  in  each  training  vector  (before  augmenting) 

* 

4>  Output  (returned): 

*  normal:  copy  of  data  with  all  features  in  all  samples  normalized 

*/ 

main(argc,  argv) 
int  aigc; 
char  *argv[]; 

/*  Matrix  noTmaJize(Matrix  data,  int  nsamples.  int  dim)  */ 

{ 

int  i,  j,num.pattems,  numjeatures; 

float  sum.  sumsq,  data[1200][1200],  meau[1200j,  std[1200]; 

FILE  *infile,  »outfile; 

/•  Vector  mean,  std;  */ 

if  (argc  ^  5) 

^rintf(stderr,  "Xs:  usage;  Xs  <i,nfila>  <outfila>  Cnumber  of  cla8se8>  Cvactors  per  cla88> 
<  number  of  features  >  <  number  of  pattem8>\n",  argv|0],  argv[0]); 
exit(l); 

} 

if  ((infile  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  XsXn",  argvjl]); 
exit(O); 

} 

if  ((outfile  =  fopen(argv[2].  "w"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  output  file  XsXn",  argv|2)); 
exit(O): 

} 

ntimJeatures  =:  atoi(argv[3]); 
niim.patterns  =  atoi(argv[4]); 

for(i=0;  i  <  num-pattems;  i++) 
for(j=0:  j<  numJeatures;  j++) 
f8caiif(infile,"Xf  ",&data[i]  [j] ); 
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/»  mean  =  Y.alloc(dim); 
std  =  vjAUoc(dim);  ♦/ 


/*  Compute  mean  and  std  vector:  mj 
for  (j=0;  j  <  numJeatures;  j++) 

{ 

sum  =  0.0; 
sumsq  =  0.0: 

for  (i=0;  i  <  num4>attern8;  i++) 

{ 

sum  +=  data[i][i]; 
sumsq  +=  SQR(data[i][j]); 

} 

mean[i]  =  sum  /  num.pattems; 

std[j]  =  (sumsq  —  (SQR(sum)  /  num-patteni8))/(num_pattenis— 1); 

} 

/k  Normtdize  and  augment  all  Ua  samples:  mf 
for  (i=0;  i  <  num.patterns:  i++) 

{ 

if  (i  ^  0)  fprintf(outfile,"\n"); 

for  (j=0;  j  <  numJeatures;  j++) 

{ 

data(i)(j}  =  (data(i][j]  -  meanij))  /  stdjj]; 

^rintf(oatfile,"Xf  ",data[i]lj]); 

} 

} 

/*  hee(meam); 
bee(std);  */ 

} 

} 


Program:  separate.c 

Description:  Assigns  a  Sgure  of  merit 
for  each  dimension  in 
n-dimensional  feature  space 


#include  <stdio.h> 
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^include  <math.h> 

#include  "jkatacroa.h" 

in&m(argc.  argv) 
iat  argc; 
char  *argv[]; 

1 

int  ij,k: 

int  num.claases,  length,  vectors^>er^lass,  num  Jeatures,  num-patterns: 
int  featurejiumber[500]; 

float  class^ean|500j{500j,class.variance(500}[500].templ ,temp2; 
float  newdatajnatrix[500]  [500]  .KcroasxlaM jnean|500] : 
float  acrouxlaas.variance[500|,  meanxf.var[5(K)]; 
float  varx>{xueans[500],  fomxrdered.vector[S00],fom|500]; 

FILE  •infile,  •outfile; 

if  (argc  7) 

{ 

fprintf(gtderr,  "X«:  naaga:  Xa  <infila>  <ontfila>  Cnoaber  of  cla8ses>  cvectora  per  cla88> 
<  number  of  featuraa  >  <  number  of  pattem8>\n“,  argv[0],  argv[0|); 
exit(l); 

} 

if  ((infile  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  Xa\n".  argv[l]); 
exiUO): 

} 

if  ((outfile  =  fopen(argv[2),  "w"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  output  file  %8\n",  tirgv[2]); 
exit(O): 

} 

numxlasses  =  atoi(argv[3]); 
vectors^er.class  =  atoi(argv[4])j 
numJeatures  =  atoi(aTgv[5]); 
uum.patterns  =  atoi(argv[6j); 

/*  read  in  the  data  fife  so  that  the  Srst  column  is  class  1,  feature 
vector  1;  the  second  column  is  feature  vector  2  and  so  on.  •/ 

for(i=l;  i  <  uum.patterns;  i++) 
for(j=l;  j<  numJeatures;  j++) 
fscanf(  infile,  "Xf  "  ,&newdataxiiatrix[j]  [i] ); 
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Loop  1  i  ( numxlasses ) 

Looplj(numJ'eatures) 

{ 

tempi  =  0.0; 

Loop  1  k(  vectors  4>er  . class ) 

{ 

tempi  +=  newdatamiatrix[j][k  +  ({i  —  1)  ♦  vectors^er-class)]; 

} 

class  jnean[j][i]  =  templ/vector84ier.clas8: 

/*  priatfC class jnean  %t\n”, class jnean[j][i}); 
fpTintf(outSle”%{\D”,classjaiean[j]lil);  */ 

} 

Loop  1  i  ( num  jclasses ) 

Looplj(numJeatures) 

{ 

temp2  =  0.0; 

Looplk(vectors4>er -class) 

{ 

temp2  +=  (newdata.jnatrix[j][k  +  ((i  —  1)  ♦  vectors-per.class)]  —  classmieaii[j][i])  ♦  (new- 
data_matrix[j][k  +  ((i  —  1)  ♦  vectors-per.class)]  —  clas8.mean[j][i])  ; 

} 

class.variance[j][i]  =  temp2/%ctors.per.class; 

/» print fC class  var  %{\n’’,  class.vaTiance[j](iJ); 
fprintf( ou tfiie,  ” %f\n  ” ,class.variance[jj[ij);  */ 

} 


^^^^,^,^^,^l^1^^^^1^^l^t****S****<^**********************’¥»S*^‘****•*SS*^^*******S**^H^ 

Calculate  across^lass.meaB  and  across-class.variance  matrices 


Loopli(oum  -features ) 

{ 

tempi  =  0.0; 

Looplj(num-classes) 

tempi  +=  class-meem[i][j]; 
across -class -meem[i]  =  templ/num.classes; 

} 

Loop  1  i  ( num-features ) 

{ 

temp2  =  0.0; 

Loop  1  j  ( num-classes ) 

temp2  +=  (classmieam[i][j]  —  across jclass-mean|i])  *  (cl:tfS-mean[i][j]  —  acrossjcleiss-mean[i]); 
SM;ross.class-variance[i]  =  temp2/num.classes; 

} 

^:ti^!:t:^^tif****************************S****m****»***************S************* 

Calculate  mean-of-var  and  var.of-niean  vectors 

i:*^^L^:tL^^i4:****4Ht***tlHi^i*4:**************************************************^ 
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Loop  1  i  ( Hum  Jeat  ures ) 

{ 

tempi  =  0.0; 
temp2  =  0.0; 

Loop  Ij  ( num  Masses ) 

{ 

tempi  +=  clas8.vtLriance[i][j]; 

temp2  +=  (cla88maeaii[i][j]  —  acro88xla88jnean[i])  *  (class jneau[i][i]  —  acTossjcla8smieaii[i]); 

} 

meanjof-var[i]  =  templ/num.cla8se8; 
varjof-mea&8[i]  =  temp2/oum.clas8e8; 

} 

Calculate  (Figure  of  Merit)  fom  vector 


Loopli(numJeatures) 

{fomjordered.vector[i)  =  fom[i]  =  var^fjiieans[i]/meanjofjvar[i]; 
feature  jiumber[i]=i; 


} 


Sort 


piksr2(numJeatures,fomj}rdered.vcctor,  feature-number); 


Loopli(num-features) 

fprintf{outfile."Xd\ty,f\n",feature.number[(num-feature8+l)— i],fomj)rdered-vector[(numJeatures+l)— i]); 
]l*end  main*/ 

/^iitmnit********t********************************************************** 


Program:  klt.c 


Description:  Program  to  calculate  the  eigenvector^  of 
a  set  of  n-dimensional  data 


l^^^,^l***1^^^**mm******: 


#include  <stdio.h> 
#include  <math.h> 
#iuclude  <string.h> 
#iDclude  "jkniacros.il" 

maiu(argc.  argv) 
int  argc; 
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char  *aTgv[]: 

{ 


PILE  *iufile,  *outiile,  *outiUe2; 

int  length4ium.train,num^odeword8,uum.classes,  numjeigvectors; 
int  iJ,NJc,Mjirot; 

float  **inatrix(),  ♦vector(),  •♦A,  **A.traus.  ♦•u,  ♦•L,  •*v,  *d,  *average^emp,  temp; 

void  free_vector(),  £reejnatiix(),eiggrt()Jacobi(); 

char  typemame[30],avg^e[30],  m8g(30],  m8gl[30],filenamel40),file(40l; 


leiigth=atoi(argv[3]); 

&um-train=atoi(argv[4] ); 
numjeigvectors  =  atoi(argv[5]); 

if  (argc  ^  7) 

{ 

fprintf(8tderr,  "Xs:  usage:  Xs  <infile>  <oi»tJile>\ii”,  argv[0],  argv[0]); 
exit(l); 

} 

if  ((infile  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr.  "Couldn't  open  file  list  XaXn".  argv(l)); 
exit(O): 

} 

if  ((outfile  =  fopen(argv[2],  "u"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  output  file  Xs\n",  argv[2]); 
exit(O): 

} 

if  ((outfile2  =  fopen(argv[6].  "u"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  output  file  XsXn",  argv[6]); 
exit(O); 

} 


/rn*****  Allocate  memory  *♦♦**</ 

A  =  matrix(l, length, Imumjtreun); 

A-trans  =  matrix(l,num.trmn,l, length); 
average-temp  =  vector(l, length); 

L  =  matrix(l,num-train,l,num.train); 
d  =  vector(l,num.train); 

V  =  matrix(l,num.train,l,num.train); 

/*♦***»  Initalize  matrix  and  vectors  *♦**♦</ 

for(j=l;j<numjtrain'j++) 

for(i=l;i<length;i++) 
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A.traii8[j][i]=A[i]y]=averageJemp[i]=0.0; 


printf("\iiTha  ttsars  being  trained  on  are  :\n\n"); 

/*  openjead(argv[lj,  ”2c03. train”);  */ 

Loop  li(num.train) 

{ 

Looplj(length) 

{ 

f»canf(infile,  "Xf",  &A[jJ(iJ); 

} 

} 

average  vector*******************/ 

/*  sprintf(avg£le,  "avg.%s.dat”,  type^ame);  */ 

/*  opeB,write(arg[4],  avg^e);  */ 

Loopli(length) 

{ 

temp  =  0.0; 

Loop  lj(num -train) 

{ 

temp  +=  Ajillil; 

} 

averageJ;emp[i]  =  temp/num.train; 

} 


/**************Subtract  average  vector***********************/ 

Looplj(num.train) 

Loopli(length) 

A[i][j]  =  A[i][j]  —  average  jtempp); 
free-vector(average.temp,  1,  length); 

/*************Make  transpose  matrix************************/ 

Looplj(num.train) 

Loopli(length) 

A.transy][i]  =  A[i][j]; 


/*******************Matrix  multiply  A  by  itself***************/ 

Loopli(num.train) 

Loop  1  j  ( num. train ) 

{ 

temp  =  0.0; 
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Looplk(length) 

temp  =  temp  +  A.traiw[i](k]  *  A[k|tjj; 

Hillil  =  temp; 

} 

&eejnatrix(AJraiis,  1,  num.train,  1,  length); 

^»**«««»**»****Do  Jacobi  rotation  and  sort  eigenstuf»**«*»****y 


jacobi(L,  num.tridn,  d,  v,  &nrot); 

eigsrt(d,  v,  num.train); 

for  (i=l;  i<numjeigvector8;  i++) 

{  printf("eig«nvala«  Xd  ia  Xf\n"4,d(i]); 
fprintf(outfile2,"Xf\n",d[i]);  } 

^***»«4i****«*****Find  eigenvectors*«**«*************************V 


u  =  matrix(l,  length,  1,  num.train); 
Loopi  i(num.train) 

Looplj(length) 

ulilli]  =  0.0; 


Loopli(  num  .train) 

Loop  Ij  ( num  .train) 

Looplk(length) 

u[kl[i]  =  vO)(i]  *  A(k][j)  +  u[k](i]; 


/*•****♦♦•♦•  Wh'te  file  containing  list  of  eigenvector  names********V 

/*  spTint{(msg,  ”%s.train.out”,  typejiame);  ♦/ 

/*  open.WTite(out61e,  xnsg);  */ 

/*  spTitttf(msgl,  "eigenvector”,  typejiame);  ♦/ 

/•  Loopli(numjeigvectors) 

{ 

sprintfffiie,  ’’%s%d.dat”,  msgl,  i); 
fpTmtf(out£le,  ”%s\n”,  Sle); 

} 

fcJose(outSle);  */ 

/*  close  this  for  selected  eigenvectors  */ 

Loopli(numjeigvectors) 

{ 

Looplj(length) 

fprintf(outfiIe,  "Xg\n",  u[j][i]); 


} 

fclose(outfile); 

/*  to  print  selected  eigenfeatures 
for(i=l;  i  <  num-eigvectors;  i++) 
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foT(j=l;j  <  length; 

{  if(is=s=l)  /jpriBtf(out£Je,”%g\B”,  u[j][i]): 
if  (i==5)  fprint{(out£le,”%g\n”.  u[ij[ij); 
if  (i==!9)  fyrintf(out£le,’’%g\n’’,  ufij/ijj; 

if  (i=ssl2)  fprintf(out£le,”%g\n'',  u[j][i]); 
if  (i==17)  fprintf(outSle,’’%g\n'’,  u[j}[i]); 

if  (i==7)  fyrintf(out£Je,”%g\n’',  u[j][i]); 
if  (i==3)  fyrintf(out£le,’’%g\B”, 
if  (i==4)  fpTintf(out£le,’’%g\n’’,  n[jj[il); 

if  (i==3)  fyTintf(out£le,”%g\B”,  n[j][i]); 
if  (i==24)  fprintf(out£le,”%g\n'’,u[j}[i]);  }  */ 


fiee^atrix(A,  1,  length,  1,  ntim.train); 
freejnatrix(u,  1,  length,  1,  num.train); 
freejnatrix(A,  l,length,l,nuin-train); 
free-matrix(L.  I,num.train,l4ium^rain); 
iTeejuatrix(v,l,nam^rain,l,numjtrain); 
free.vector(d,  l^iuna.train); 


}  /*  end  main*/ 

/H^^L^^^^^^tm************************************** 

Program:  kltftrs.c 

Description:  Program  which  takes  the 
the  original  data  set  and  multiplies  the 
eigenvectors  calculated  from  klt.c  to  get 
the  new  set  of  features 


^include  <stdio.h> 


niain(argc,  argv) 
int  argc; 
char  *argv[); 

{ 

int  io<k,  nuxDjeigenvec,  num Jeatures4)um4>attern8; 
float  input [500] [500],  kltvec[500][500],  sum[500][500],  feat; 

FILE  *infile,  *infile2,  *outfile; 


if  (argc  ^  7) 

{ 

fprintf(stderT,  "Xb:  nsage:  Xs  <inlil«>  <oatfil*>\n",  argv[0],  argv[0]) 
exit(l); 

} 
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if  ((infile  =  fopen(argv[l],  “r"))  ==  NULL) 

{ 

printf(8tderT, ''Couldn't  open  fil*  list  X*\n",  argvjl]); 
exit(O): 

} 

if  ((infile2  =  fopen(argv[2],  "r"))  ==  NULL) 

{ 

printf(stderT,  "Couldn't  open  fils  list  Xs\n",  nrgv(2l): 
exit(O); 

} 

if  ((outfile  =  fopen(argv[3],  "u"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  opsn  output  fils  XsVn".  argv[3l); 
exit(O); 


num^igenvec  =  atoi(argv[4]); 
numJeatures  =  atoi(argv[5]); 
num.patterns  =  atoi(argv[6]); 


/►*  read  in  the  values  of  the  kit  matrix  **/ 
for  (i=:0;  i  <  numjeigenvec;  i++) 
for  (j=0;  j  <  numJeatures;  j++) 
fccanf  (infile2,  "Xf ",  &kltvec(i][j]); 

/♦*  read  in  the  data  set  to  be  reduced  *v 
for  (i=0;  i  <  num.patterns;  i++) 
for  (j=0;  j  <  numJeatures;  j++) 
fscanf  (infile,  "Xf ",  &input[i][j]): 

/♦♦  establish  the  new  set  of  features  vector  **/ 
for  (1=0;  i  <  num.patterns;  i++) 
for(j=:0;  j  <  numjeigenvec;  j++) 
sum[i][j]=0.0; 


/**  multiply  input  and  kit  to  gel  new  features  **/ 
for  (i=0;  i  <  num.pattern8;  i++) 
for  (k  =  0;  k  <  num.eigenvec;  k++) 
for  (j=0;  j  <  numJeatiues;  j++) 

{ 

feat  =  input[ij[i]  *  kltvec[k][i]; 
sum[i)[k|  =  sumpllk)  +  feat; 

} 

/►*♦  write  to  a  file 

for  (i=0;  i  <  num.patterns;  i++) 

{  if  (i  0)  fprintf( outfile,  "\n"); 
for  (j=0;  j  <  num.eigenvec;  j++) 
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{priutf(outfile,  "Xf  "^um[i][j]); 

} 

fclc>6e(iniUe); 

fclo8e(infile2); 

fclo8e(outfile); 

}  /*  end  main  */ 


Program:  magphase.c 

Description;  Computes  magnitude  and  phase 
of  the  lower  3  harmonics 


#include  <8tdio.h> 

#include  <math.h> 

#defiue  ARC(a)  (float )atan((double)(a)) 

#deflne  PI  3.141592654 
main  (argc,  argv) 
int  argc: 
char  •argvQ: 

{ 

int  ij,  k,  jimk.  num.patterns,  numJeatures; 

float  mag[500][10l[10],  pha8e[500l[10][10],  data[500](10)110); 

float  temp.templ.temp2; 

FILE  *616,  fphase^le,  *magJile.  *phasejonlyJile.  *mag4>hase^Ie,  *magjonlyJile; 


if  (argc  ^  2) 

{ 

fprintf(8tderr,  "Xs:  usage;  Xs  <infile>  <outfile>\n",  argv[0],  argv[0]): 
exit(l); 

} 

if  ((flle  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(8tderr,  "Couldn't  open  file  list  Xs\n",  argv(l)); 
exit(O): 

} 


num.patterns  =  400; 
numJeatures  =  49; 


B-24 


for  (k=l:  k  <  num^>attem8;  k++) 

{  fscanf(file,"X<l'',  &junk); 

for  (j=l;  i  <7;  i++) 
for  (j=l;  j  <  7;  j++) 

fscanf(file,  "Xf'.&datalkKillj]);  ) 


for  (k=l;  k<  num4>atterns;  k++) 

for  (i=l;  i<3;  i++) 
for  (j=l;  j  <  7;  j++) 

“a«W[>)lj)  =  O-O; 

pha8e[k][i][jj  =  0.0; 


for  (k=l;  k<  num.patterns:  k++) 

{ 

for  (i=l;  i<3;  i++) 
for  (j=l;  j  <  7;  j++) 

mag[k][i][j]  =  8qrt(data[k][i][j]*data(k|(i][i]  +  data[k][8-i)[j]*data[k][8-il[jl): 


i=:4; 

for  (j=l:  j<3:  j++) 

maglk][i](jl  =  8qrt(data[k](i][j]*data[k][i][j]  +  data[k][i)[8-j)*data[k)[j][8-j]); 


} 


for  (k=l;  k<  num.patterus;  k++) 

{ 

for  (i=l;  i<3;  i++) 
for  (j=l:  j  <  7;  j++) 


if  (data{k]{ij[jj  >  0  &&  data[k][8— i][i]  >0) 
phase[k][il[j]  =  (180/H)*ARC({double)data[k][8-i)[j]/data[klIi][j]); 

else  if  (data[k][i][j]  >  0  &&  data[k][8— il[il  <  0) 
phase[k][i]Li]  =  360  +  (180/H)*ARC((double)data[kJ[8-il[jl/data[k][i][jl): 

else  if  (data[k][i][j]  <  0  &&  data[k](8— i][j]  >0) 
phaselk)li]lj]  =  180  +  (180/FI)*ARC((double)data(kl[8-i]lj]/data[kl[i]lj]); 

else  if  (data[k][i][j]  <  0  &&  data[k](8— il[j]  <  0) 
phase(k][il[j]  =  180  +  (180/PI)*ARC((double)data{k][8— i)[j]/data[k][i)[j]); 
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i=4; 

for  (j=l;  j<3;  j++) 


if  (data(kl[i]lj|  >  0  icSc  data(k][i](8— j]  >0) 
phasefkKiiyi  =  (180/PI)*ARC((double)data(kl(il(8-jl/data(kl[illj]): 

else  if  (data[k](i][j]  >  0  &&  data[k][i][8— j]  <  0) 
phase[kl[i][j]  =  360  +  (180/PI)*ARC((double)datalk]|i]l8-j]/<kta[k)[i]lj)); 

else  if  (data(k|[i]y]  <  0  data[k][i](8-j]  >0) 

phaselk][ilp]  =  180  +  (180/PI)*ARC((double)data(klli)(8-j]/datalk][i][j]); 

else  if  (data[k][i][j]  <  0  Stic  data[k][i)[8-j]  <  0) 
pba8e[k][i]lj]  =  180  +  (180/PI)*ARC((double)data[k](ill8-j]/data[k][i][j)); 


} 


/•  print  to  a  file  to  verify  it  works  •/ 

phase^le  =  fopen(  "phase","*"); 
k=l: 

{ 

for  (i=l;  i<3:  i++) 
for  (j=l;  j  <  7;  j++) 
fprintf(phaseJile,  "Xl\n",  phase(k][i][j]); 

i=4; 

for  (j=:l:  j  <3;  j++) 
fyrintf(phase-file,  "5lf\n",  pha8e[k][i][j]); 

} 

magJile  =  fopfcn("magnitnde","*"); 
k=l; 

{ 

for  (j=l;  i<3;  i++) 
for  (j=l:  j  <  7;  j++) 
fprintf(magjile,  "Xl\n",  mag[kl[il[3l); 

i=4; 

for  (j=l;  j  <3;  j++) 

fprintf( magpie,  "XfNn",  mag[k][i][j]): 

} 


print  a  new  feature  set  */ 
print  mag  and  phase  */ 

mag4>ha8e_file  =  fopen("niag_pha8e.ftr8","w"); 
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for  (k=l;  k<  num.patterns;  k++) 

{ 

for  (i=l;  i<3;  i++) 
for  (j=l;  j  <  7;  j++) 

fyriiitf(in:^^>ha8e^e,  "Xf  ",  mag[k][i]|j]); 
i=4; 

for  (j=l:  j  <3;  j++) 

^rijitf(mag^hase^e,  "Xz'  ",  mag[k][i][j]); 

j=4;  /*print  dc  terw  */ 
fprintf(mag^hase^le,  "Xf  ",  data[k][ijy]); 

for  (i=l;  i<3;  i++) 
for  (j=l;  j  <  7;  j++) 

fprintf(mag^haseJUe,  "Xf  ",  phasc[k|[i](j]): 
i=4; 

for  (j=l;  j  <3;  j++) 

fpriiitf(mag^hase^e.  "Xf  ",  phase[k](i)[j]); 
fprmtf(mag4>haseJile,  "\n"); 

} 


/*  print  phase  only  *j 

phase^nlyJile  =  fopeii("pha8e_ftr8","v"): 

for  (k=l;  k<  num.patterns;  k++) 

{ 

for  (i=l:  i<3;  i++) 
for  (j=l:  j  <  7:  j++) 

fprintf(phase.onlyJile,  "Xf  ",  phase[k][i][j]): 
i=4; 

for  (j=l;  j  <3;  j++) 

fprintf(phasejonlyJile,  "Xf  ",  pha8e[k][i](j]); 
fprintf(phase.onlyJUe,  "\n"); 

} 

/*  print  mag  only  */ 

magjonlyJile  =  fopen("mag_ftr8","u"); 

for  (k=l;  k<  num.patterns;  k++) 

{ 

for  (i=l;  i<3;  i++) 
for  (j=l;  j  <  7;  j++) 
fprintf( magjonlyJile.  "Xf  ",  mag[k][i]y]); 


i=4; 

for  (j=l;  j  <3;  j++) 

fpriiitf(magjon]yJi]e,  "W  ",  iiiag[k][i)y]); 

j=4;  l*piiat  dc  term  */ 
fyrintf(mag^aly^e,  "Xf  ",  data[k]|iJli]); 


} 


}  /**  end  main  **/ 


Program:  getfom.c 

Description:  Builds  a  feature  set  from 
selected  features  of  the  441  calculated 
from  the  original  feature  set. 

***«««**«*««*««****«*«*********#**«*«**«**•**/ 

#include<stdio.h> 

main(argc.  argv) 
int  argc; 
char  *argv[]: 

int  i,  j.  done  =  0,  num.pattems,  numJeatiiresJunk.dimensions; 
float  val,data[800][800]; 
cheir  filename[80],  temp[20]; 

FILE  *file,  *outfile; 


static  int  classQ  = 


{0.0,  0.  0,  0,  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0.  0,  0.  0,  0,  0, 
0.  0.  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 

0.  0,  0,  0,  0,  0.  0,  0,  0,  0,  0.  0,  0,  0,  J,  0,  0,  0,  0,  0. 

0.  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0,  0, 

0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 

1, 1. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

1, 1. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1. 1, 1, 

1.1. 1.1, 1,1, 1,1, 1,1. 1,1,1, 1,1, 1,1, 1, 1,1, 
1. 1, 1, 1, 1, 1, 1, 1, 1, 1. 1, 1, 1, 1, 1, 1. 1. 1, 1, 1, 
2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  2,  2, 
2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2.  2,  2,  2,  2,  2,  2.  2,  2,  2. 
2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2,  2,  2.  2,  2,  2,  2.  2.  2,  2, 
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2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2.  2,  2,  2.  2.  2.  2,  2, 

2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2,  2.  2.  2,  2. 

3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3,  3,  3,  3,  3, 

3.  3,  3,  3,  3,  3,  3,  3.  3,  3,  3,  3,  3.  3,  3,  3.  3,  3.  3,  3, 

3,  3.  3,  3,  3,  3,  3,  3.  3,  3,  3,  3,  3.  3,  3,  3.  3.  3.  3,  3, 

3.  3.  3.  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3,  3,  3, 

3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3,  3,  3,  3.  3,  3,  3  }; 


/»  added  an  extra  zero  in  classQ  to  account  for  the  for  loop  beJow 
beginning  with  i=l  */ 


static  int  selectedJeatiues  Q  = 

{  221,  222,  220,  199,  243,  201,  241, 
200,  242,  177.  261, 178,  262, 

179,  263,  180,  264,  181,  265, 

198,  240,  202.  244,  219,  223, 
155,281,156,  282, 157,  283, 

158,  284,  159,  285,  160,  286, 

161,  287, 176.  260,  197,  239, 

182.  266,  203.  245,  218,  224  }; 


/♦  static  int  selected Jeatures  []  = 
{  245  };  ./ 


if  (argc  ^  6) 

{ 

fprintf(stderr,  "Xs:  nsage:  Xs  <infile>  <ontfile>\n".  argv[0],  argv[0]) 
exit(l); 

} 

if  ((file  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  XsXn".  argv[lj): 
exit(O); 

} 

if  ((outfile  =  fopen(argv[2],  "w"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  output  file  XsVn",  axgv[2]); 
exit(O); 

) 

num.patterus  =  atoi(argv[3]); 
num  Jeatures  =  atoi(argv[4]); 
dimensions  =  atoi(argv[5]); 

for  (i=l;  i  <  num-pattems;  i++) 

for  (j=0;  j  <  numJeatures;  j++) 
if  (j==0)  fscanf(file."Xd",  &junk): 

else 

fscanf(file.  "y,f",&data[i][j]); 
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for(i=l;  i<  num.patteros;  i++) 

{ 

^rintf(outfile,"Xd  ", class [i|); 
for(j=0;  j<  dimensions;j++) 

fprintf(outfile,  "Xf  ",data[i][8electedJeature8[j]]); 
fpiintf(oiitfile,"  \n" ); 

} 


fclose(file); 

fclo8e(outfile); 

} 

Program;  addcJassid.c 


Description:  Adds  classid's  to  the 
feature  set  for  use  in  LNKnet 

*************************«*«***«*««*«***«**»«y 

#include<stdio.h> 

main(argc.  argv) 
int  argc; 
char  *argvQ; 

{ 

int  i,  j=0,  done  =  0,  num.pattems,  niun  Jeatures; 
float  val.data[800]{800]; 
char  filenanie[80],  temp[20]; 

FILE  *fUe,  *outfile; 

static  int  classQ  = 

{0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 

0,  0,  0.  0.  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0, 

0,  0,  0,  0,  0,  0,  0, 0,  0,  0,  0,  0,  0,  0, 0,  0, 0,  0, 0, 0, 
0,  0,  0,  0.  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0, 

0,  0,  0.  0,  0,  0,  0,  0,  0,  0,  0.  0,  0,  0,  0,  0, 0,  0,  0,  0, 

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

1, 1. 1, 1, 1, 1, 1, 1, 1. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1. 1, 

1. 1, 1, 1. 1, 1, 1, 1, 1. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

1, 1, 1, 1. 1, 1, 1, 1, 1. 1. 1, 1, 1, 1, 1, 1, 1. 1, 1, 1, 

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1. 1, 1, 1, 
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2,  2.  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  ? 

2,  2.  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2,  i 

2,  2,  2,  2.  2,  2,  2,  2,  2.  2,  2,  2,  2,  2,  2,  2,  2.  2,  2,  2, 

2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2.  2,  2, 

2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2, 

3^  3,  3f  3,  3,  3)  3^  3i  3,  3^  3,  3)  3i  3,  3^  3i  3*  3*  3)  3» 

3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3, 

3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3,  3,  3, 

3^  3,  3f  3v  3»  3,  3f  3,  3^  3^  3,  3^  3^  3,  3«  3*  3,  3,  3»  3^ 

3,  3.  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3.  3,  3,  3  }: 


if  (argc  #  5) 

{ 

fprintf(8tderr,  "Xb:  tta«g«:  t*  <iiifil«>  argv[0],  argv[0)); 

exit(l); 

} 

if  ((file  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  Xa\n",  argv(l]); 
exit(O); 

} 

if  ((outfile  =  fopen(argv[2],  "w"))  ==  NULL) 

{ 

printf(stderr.  "Couldn't  open  output  tile  X»\n",  aigvl2)); 
exit(O); 

} 

num4)attern8  =  atoi(argv[3]); 
numJeatures  —  atoi(argv[4]); 

for  (i=0;  i  <  num.patterns;  i++) 
for  (j=0:  j  <  numJeatures;  j++) 
f8canf(file,  "Xl".&data[i]p]); 

for  (i=0;  i  <  num.pattem8;  i++) 

{ 

if  (i  ^  0  )  fprintf  (outfile,  "\n"); 
fprintf  (outfile,  "Xd  ",  clas8[i]); 
for  0=0;  j  <  numJeatures;  j++) 
fprintf  (outfile,  "Xf  ",  data[i][j]); 


} 

fclo8e(file); 

fclo8e(outfile); 


} 
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Program:  removecJassid.c 


Description;  Removes  cJassid’s  6-om  the 
feature  set. 

««***««***«*»*««*««**«**************«*«««****v 


]{|tinclude<stdio.h> 

main(argc,  argv) 
int  argc; 
char  *argv[]; 

{ 

int  i,  num.pattems.  numJeatures,  size; 
char  filename[80].  *buf,  *ptr; 

FILE  *outfile; 


if  (argc  ji  5) 

{ 

fprintf(stderr.  "Xs:  naage:  Xs  <ialila>  <ontlila>\ii",  argv[Oj,  argv[0]); 
exit(l),- 

} 

num.patterus  =  atoi(argv[3]); 
numJeatures  =  atoi(argv[4]); 

size  =  15  >)>  numJeatures  +  2; 

if  ((buf  =  (chsLT  •)  malloc(size))  ==  NULL) 

{ 

fprintf(stderr,  "Couldn't  allocate  Xd  bytes  of  8torage\n",8ize); 
exit(O): 

} 

if  ((file  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

fprintf(stderr,  "Couldn't  open  file  list  XsXn",  argv[l]); 
exit(O); 

} 

if  ((outfile  =  fopen(argv[2j,  "w"))  ==  NULL) 

{ 

fprintf(8tderr,  "Couldn't  open  output  file  XaNn",  argv(2j); 
exit(O); 

} 

ptr  =  buf+2; 

for  (i=0;  i  <  num.pattem8;  i++) 

{ 
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if  (fgets(buf,  size,  file)==:NULL) 

{ 

fprintf(stderr,  "PilIC! !  Read  error  or  EOF  aacoantaradSn"); 
break; 

} 

fputs(ptr,  outfile); 

} 

free(buf); 

fclose(file); 

fclose(outiUe); 


} 


/»4i***«*******a4i*«*****«*****aa******«*****««*aa* 

Program:  rand4c.c 

Description:  Randomizes  4  classes  of  data,  which 
is  in  a  £le  format  ready  for  LNKnet,  into 
training  and  testing  sets.  An  equal  number  of 
data  samples  are  place  in  each  set. 


#inclu<le<$tdio.h> 

#include<sys/types.h> 

#mclude<sys/time.h> 


m2un(argc,  argv) 
int  argc; 
char  *argv[]; 

{ 

int  j,x.idc,idx[100][100], temp,  num.pattems,  numJeatures; 
float  data[500][500]; 

FILE  *flle,  aoutfllel,  *outfile2,  aoutfile3; 


if  (argc  7) 

{ 

^rintf(stderr,  "Xs:  usage:  Xs  <infila>  <outfila>\n",  argv[0],  argv[0)); 
exit(l); 

} 

if  ((file  =  fopen(argv[l],  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  XsVn",  argv|l]); 
exit(O); 

} 
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if  ((outfilel  =  fopen(argv[2],  "»"))  =*  NULL) 

{ 

printf(8tderr,  "Couldn't  open  output  file  Xa\n",  argv[2]); 
exit(O); 

} 

if  ((outfile2  =  fopen(argv[3],  "»"))  ==  NULL) 

{ 

printf(stden,  "Couldn't  opun  output  filu  Xs\n'',  argv[2]); 
exit(O); 

} 

if  ((outfile3  =  fopen(argv[6],  "■"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  opun  output  file  Xs\n'',  argv[6]); 
exit(O); 

} 


/**  read  in  the  datafile  **/ 

num.pattems  =  atoi(argv[4]); 
numJeatures  =  atoi(argv[5]); 

for  (i=0:  i  <  num.pattems;  i++) 
for  (j=0;  j  <  numJeatures+1;  j++) 
{ 

if(j==0) 

fscanf(file,  "Xf",  &datali][j]); 


else 

fscanf(file,  "Xf",&data[i][j]): 


} 


/**  randomize  a  set  of  numbers  for  equal  numbers  of  training  and  testing  **/ 


for  (i=0:  i  <  4;  i++) 

{ 

srandom((long)  time(NULL)); 
for  (j=0:  j<100;  j++) 
idxli][j]  =  j; 

for  G=0ij<100u++) 

{ 

x=tandom(  )%99', 
temp  =  idx[i][x]; 
idx[i][xl  =  idx[i][i]; 
idx(i][j]  =  temp; 

} 


B-34 


} 

/**  print  values  to  the  screen  for  grins  **/ 
/►  for  (issl;  i<4;  i++) 
for  (j—0;j<100;j++)  «/ 


write  the  data  to  train  and  test  £les  **/ 
for  (i=0;  i<4;  i++)  /*  separate  each  class  */ 

{ 

if  (i  SB=  0)  /*  class  one  */ 

{ 

for  i  <  100;  i++)  A  number  of  patterns  in  class  one  */ 

{ 

if  G<49)  a  50-50  into  test  and  train  */ 

{  A  arriting  to  train  file 

if  (j5^0)  fprintf  (outfilel, 
for  (k=0;  k  <  ntimJeatures+l;  k++) 

{ 

if  (k  ==  0)  fprintf( outfilel,  "Xd  ", 

(int)  data[idx[i](j]][k]); 

else 

^rintf(outfilel,  "Xl  ",  data[idx|i][j]]pcj); 

} 

} 

if  (j>50)  A  writing  to  test  file  s/ 

{ 

if  (j^50)  fyrintf(outfile2,  "\n"); 
for  (k=0;  k  <  numJeatures+l;  k++) 

{ 

if  (k  ==  0)  {fprintf(outfile2,  "Xd  ", 

(int)  data(idx(i][j]l[k]); 

fprintf(outfile3,"Xd\n",(idx(i][j]+l));}  A*  print  list  to  a  file  *«/ 
else 

fprintf(outfile2,  "Xf  ",  data[idx[i][j]][k]); 

} 

} 

} 

} 

else 

if  (i  ==  1)  A  same  as  above,  but  for  the  next  class  */ 

{ 

for  (j=0;  j  <  100;  j++) 

{ 

if  a<49) 

{ 

fprintf  (outfilel,  "\n"); 
for  (k=0;  k  <  numJeatures+l;  k++) 

{ 
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/*  must  account  for  where  the  second  class  is  located  in  the  Sle  •/ 
if  (k  ==  0)  fprintf(outfilel,  "Xd  ", 

(int)  dnta(100+idx|illill|k]); 
else 

fprintf(outfilel,  "Xl  ",  datallOO+idxlilbDlk)); 

} 

} 

if  (j>50) 

{ 

fyTintf(outfile2,  "\n"); 
for  (k=0;  k  <  numJeatures+l;  k++) 

{ 

if  (k  ==  0)  {  fprintf(outfile2,  "Xd  ", 

(int)  data[100+idx(i)[j]l(kl); 
fprintf(outfile3,  "Xd\n",  (idx[i][j|+101));} 

else 

iprintf(outfile2,  "Xf  ",  datall00+idxli]lj]]lk])j 

} 

} 

} 

} 


else 

(•  ==  2)  /»  same  as  above,  but  for  the  next  class  *J 

{ 

for  (j=0;  j  <  100;  j++) 

{ 

if  {j<49) 

{ 

^rintf  (outiilel,  "\b"); 
for  (k=0;  k  <  numJeatures+l;  k+H-) 

{ 

/*  must  account  for  where  the  third  class  is  located  in  the  Sle  */ 
if  (k  ==  0)  fprintf( outiilel,  "Xd  ", 

(int)  datal200+idx[i][j]][k)); 
else 

fprintf(outael,  "Xf  ",  data(200+idx{i)BI](kJ); 

} 

if  (j>50) 

{ 

fyrintf(outfile2,  "\n"); 
for  (k=0;  k  <  numjeatures+l;  k++) 

{ 

if  (k  ==  0)  {  fyrintf(oatfile2,  "Xd  ", 

(int)  datal200+idx(i][jll[k]); 
fprintf(outfile3,  "Xd\n",  (idx[i]ljj+201));  } 
else 

fprintf(outaie2,  "Xf  ",  data(200+idx|illjlllk)); 
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} 

] 

} 

} 

else 

if  (i  ==  3)  /*  same  as  above,  but  for  the  next  class  •/ 

{ 

for  j  <  100;  j++) 

{ 

if  (j<49) 

{ 

fyrintf  (outfilel,  "\n"); 
for  (k=0;  k  <  numJeatures+l;  k++) 

{ 

/»  must  account  for  where  the  fourth  class  Is  located  in  the  file  »/ 
if  (k  ==  0)  fpriiitf( outfilel,  "Xd  ", 

(int)  data[300+idx[i][j]][k]); 
else 

fprintf( outfilel,  "Xf  ",  data[300+idx(i][jj]pc]); 

} 

} 

if  (j>50) 

{ 

fpriiitf(outfile2,  "\n"); 
for  (k=0;  k  <  aumJeatures+1;  k++) 

{ 

if  (k  0)  {  fyrintf(outfile2,  "Xd  ", 

(int)  data[300+idx[i][i]][k]); 
fyrintf(outfile3,  "Xd\n",  (idx[i]lj]+301));} 

else 

fprintf(outfile2,  "Xf  ",  data[300+idx[i](j]]{k]); 


} 

} 

} 

} 


}  /»end  for  loop*/ 

}  /*end  main*/ 

/*********m***********^i*******m************ 
Program:  sort.bestJeature.c 
Description:  This  program  is  part 
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of  the  script  fiie,  for  add-on  testing. 

It  finds  the  best  feature  during  addon 
testing. 


#include  <8tdio.h> 
znaiuO 
{ 

int  i, count=:0, feature Jiuinber[500]; 
float  testjenor[500]; 

FILE  *infile.  *outfile; 


if  ((iniile  =  fopen(''t«8t .features",  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  list  Xs\n"); 
exit(O): 

} 

if  ((outfile  =  fopen( "feature",  "a"))  ==  NULL) 

{ 

printf(stderT,  "Couldn't  open  output  file  Xs\n"); 
exit(O): 

} 

for(i=0:i<500:i++) 

testjerror[i]=1000.00; 


i=l; 

wliile  (fsceinf(infile,  "Xd  Xf",&featurejiumbeT|i],&testjerrorli])  ^  £OF){ 

i++; 

count++: 

} 

/*foT(i=l;i<  8;i-l-h){ 

fscanf(infiJe,  ”%d”,  &featurejtumber[i}); 

fscanf(in£Je.  ”%P ,&testjerror[i]);  »/ 

/»  count  =  8;  */ 


^**4r4i**«*«e*****««««e*ee*e*e*e*****ee*»***««««*e********e**ee*««****«*e*e** 

Sort 

pik8r2(  count , test  jerror, feature  Jiumber) ; 
fprintf(outfile,"Xd\n",  featurejiumber[l]); 


} 
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^*****««««««« **««««*«««*«««««#*«******«* 
Program:  geterror.c 


Description:  Snds  which  patterns 
were  misclassihed.  This  program  is 
is  part  of  the  script  £le  getmiss. 

«««««********«*«««***«*«*««***«*******«y 


^include  <stdio.h> 

#include  <8tring.h> 
miun() 

{ 

int  j,howjnany; 

float  trainjerT[100],testjerr|100], train j'mserr[100],testjinserr[100]; 

FILE  •inflle,  *outfile; 

if  ((infile  =  fopen( "error",  "r"))  ==  NULL) 

{ 

printf(stderr,  "Couldn't  open  file  liet  X«\n",  "miss"  ); 
exit(O); 

} 

if  ((outfile  =  fopen("error.report",  "■"))  ==  NULL) 

{ 

printffstderr,  "Couldn't  open  file  list  XsNn".  "teet.teet”  ); 
exit(O); 

} 

fgcanf(infile,"Xd",  &howjnany); 

/»  get  the  error  */ 
for  (j=l;  j<howjnany/^  j++) 

f8canf(infile,  "Xf  Xf  Xf  Xf"  ,&trainjerTp],  &trainjmserr[j],  &te8t^rr[j],  &te8tjmserr[j]); 

piksr2(howjnany/4,testjerr,trainjerr); 
piksr2(howjnany/4,  te8t.nn8err,trainnn8eiT); 

fprintf(outfile,  "X2.2f  X2.2f  X2.2f  X2.2f",  test jerr(l].trainjerrll],  test jmserrll],  trainmi8err[l]); 
}  /*end  main*/ 
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